Working with Economic data in Python

This notebook will introduce you to working with data in Python. You will use packages like Numpy to manipulate, work and do computations with arrays, matrices, and such, and anipulate data (see my Introduction to Python). But given the needs of economists (and other scientists) it will be advantageous for us to use pandas. pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python. pandas allows you to import and process data in many useful ways. It interacts greatly with other packages that complement it making it a very powerful tool for data analysis.

With pandas you can

  1. Import many types of data, including
    • CSV files
    • Tab or other types of delimited files
    • Excel (xls, xlsx) files
    • Stata files
  1. Open files directly from a website
  2. Merge, select, join data
  3. Perform statistical analyses
  4. Create plots of your data

and much more. Let's start by importing pandas and use to it download some data and create some of the figures from the lecture notes. Note that when importing pandas it is accustomed to assign it the alias pd. I suggest you follow this conventiuon, which will make using other peoples code and snippets easier.

In [1]:
# Let's import pandas and some other basic packages we will use 
from __future__ import division
%pylab --no-import-all
%matplotlib inline
import pandas as pd
import numpy as np
Using matplotlib backend: <object object at 0x187767fb0>
%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib

Working with Pandas

The basic structures in pandas are pd.Series and pd.DataFrame. You can think of a pd.Series as a labeled vector that contains data and has a large set of functions that can be easily performed on it. A pd.DataFrame is similar a table/matrix of multidimensional data where each column contains a pd.Series. I know...this may not explain much, so let's start with some actual examples. Let's create two series, one containing some country names and another containing some ficticious data.

In [2]:
countries = pd.Series(['Colombia', 'Turkey', 'USA', 'Germany', 'Chile'], name='country')
print(countries)
print('\n', 'There are ', countries.shape[0], 'countries in this series.')
0    Colombia
1      Turkey
2         USA
3     Germany
4       Chile
Name: country, dtype: object

 There are  5 countries in this series.

Notice that we have assinged a name to the series that is different than the name of the variable containing the series. Our print(countries) statement is showing the series and its contents, its name and the dype of data it contains. Here our series is only composed of strings so it assigns it the object dtype (not important for now, but we will use this later to convert data between types, e.g. strings to integers or floats or the other way around).

Let's create the data using some of the functions we already learned.

In [3]:
np.random.seed(123456)
data = pd.Series(np.random.normal(size=(countries.shape)), name='noise')
print(data)
print('\n', 'The average in this sample is ', data.mean())
0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
Name: noise, dtype: float64

 The average in this sample is  -0.24926597871826645

Here we have used the mean() function of the series to compute its mean. There are many other properties/functions for these series including std(), shape, count(), max(), min(), etc. You can access these by writing series.name_of_function_or_property. To see what functions are available you can hit tab after writing series..

Let's create a pd.DataFrame using these two series.

In [4]:
df = pd.DataFrame([countries, data])
df
Out[4]:
0 1 2 3 4
country Colombia Turkey USA Germany Chile
noise 0.469112 -0.282863 -1.509059 -1.135632 1.212112

Not exactly what we'd like, but don't worry, we can just transpose it so it has each country with its data in a row.

In [5]:
df = df.T
df
Out[5]:
country noise
0 Colombia 0.469112
1 Turkey -0.282863
2 USA -1.509059
3 Germany -1.135632
4 Chile 1.212112

Now let us add some more data to this dataframe. This is done easily by defining a new columns. Let's create the square of noise, create the sum of noise and its square, and get the length of the country's name.

In [6]:
df['noise_sq'] = df.noise**2
df['noise and its square'] = df.noise + df.noise_sq
df['name length'] = df.country.apply(len)
df
Out[6]:
country noise noise_sq noise and its square name length
0 Colombia 0.469112 0.220066 0.689179 8
1 Turkey -0.282863 0.080012 -0.202852 6
2 USA -1.509059 2.277258 0.768199 3
3 Germany -1.135632 1.289661 0.154029 7
4 Chile 1.212112 1.469216 2.681328 5

This shows some of the ways in which you can create new data. Especially useful is the apply method, which applies a function to the series. You can also apply a function to the whole dataframe, which is useful if you want to perform computations using various columns.

Let's see some other ways in which we can interact with dataframes. First, let's select some observations, e.g., all countries in the South America.

In [7]:
# Let's create a list of South American countries
south_america = ['Colombia', 'Chile']
# Select the rows for South American countries
df.loc[df.country.apply(lambda x: x in south_america)]
Out[7]:
country noise noise_sq noise and its square name length
0 Colombia 0.469112 0.220066 0.689179 8
4 Chile 1.212112 1.469216 2.681328 5

Now let's use this to create a dummy indicating whether a country belongs to South America. To understand what is going on let's show the result of the condition for selecting rows.

In [8]:
df.country.apply(lambda x: x in south_america)
Out[8]:
0     True
1    False
2    False
3    False
4     True
Name: country, dtype: bool

So in the previous selection of rows we told pandas which rows we wanted or not to be included by passing a series of booleans (True, False). We can use this result to create the dummy, we only need to convert the output to int.

In [9]:
df['South America'] = df.country.apply(lambda x: x in south_america).astype(int)

Now, let's plot the various series in the dataframe

In [10]:
df.plot()
Out[10]:
<AxesSubplot:>

Not too nice nor useful. Notice that it assigned the row number to the x-axis labels. Let's change the row labels, which are contained in the dataframe's index by assigning the country names as the index.

In [11]:
df = df.set_index('country')
print(df)
df.plot()
             noise  noise_sq noise and its square  name length  South America
country                                                                      
Colombia  0.469112  0.220066             0.689179            8              1
Turkey   -0.282863  0.080012            -0.202852            6              0
USA      -1.509059  2.277258             0.768199            3              0
Germany  -1.135632  1.289661             0.154029            7              0
Chile     1.212112  1.469216             2.681328            5              1
Out[11]:
<AxesSubplot:xlabel='country'>

Better, but still not very informative. Below we will improve on this when we work with some real data.

Notice that by using the set_index function we have assigned the index to the country names. This may be useful to select data. E.g., if we want to see only the row for Colombia we can

In [12]:
df.loc['Colombia']
Out[12]:
noise                   0.469112
noise_sq                0.220066
noise and its square    0.689179
name length                    8
South America                  1
Name: Colombia, dtype: object

Getting data

One of the nice features of pandas and its ecology is that it makes obtaining data very easy. In order to exemplify this and also to revisit some of the basic facts of comparative development, let's download some data from various sources. This may require you to create accounts in order to access and download the data (sometimes the process is very simple and does not require an actual project...in other cases you need to propose a project and be approved...usually due to privacy concerns with micro-data). Don't be afraid, all these sources are free and are used a lot in research, so it is good that you learn to use them. Let's start with a list of useful sources.

Country-level data economic data

Censuses, Surveys, and other micro-level data

  • IPUMS: provides census and survey data from around the world integrated across time and space.
  • General Social Survey provides survey data on what Americans think and feel about such issues as national spending priorities, crime and punishment, intergroup relations, and confidence in institutions.
  • European Social Survey provides survey measures on the attitudes, beliefs and behaviour patterns of diverse European populations in more than thirty nations.
  • UK Data Service is the UK’s largest collection of social, economic and population data resources.
  • SHRUG is The Socioeconomic High-resolution Rural-Urban Geographic Platform for India. Provides access to dozens of datasets covering India’s 500,000 villages and 8000 towns using a set of a common geographic identifiers that span 25 years.

Divergence - Big time

To study the divergence across countries let's download and plot the historical GDP and population data. In order to keep the data and not having to download it everytime from scratch, we'll create a folder ./data in the currect directory and save each file there. Also, we'll make sure that if the data does not exist, we download it. We'll use the os package to create directories.

Setting up paths

In [13]:
import os

pathout = './data/'

if not os.path.exists(pathout):
    os.mkdir(pathout)
    
pathgraphs = './graphs/'
if not os.path.exists(pathgraphs):
    os.mkdir(pathgraphs)

Download New Maddison Project Data

In [14]:
try:
    maddison_new = pd.read_stata(pathout + 'Maddison2020.dta')
    maddison_new_region = pd.read_stata(pathout + 'Maddison2018_region.dta')
    maddison_new_1990 = pd.read_stata(pathout + 'Maddison2018_1990.dta')
except:
    maddison_new = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2020.dta')
    maddison_new.to_stata(pathout + 'Maddison2020.dta', write_index=False, version=117)
    maddison_new_region = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_region_data.dta')
    maddison_new_region.to_stata(pathout + 'Maddison2018_region.dta', write_index=False, version=117)
    maddison_new_1990 = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_1990bm.dta')
    maddison_new_1990.to_stata(pathout + 'Maddison2018_1990.dta', write_index=False, version=117)
In [15]:
maddison_new
Out[15]:
countrycode country year gdppc pop
0 AFG Afghanistan 1820 NaN 3280.00000
1 AFG Afghanistan 1870 NaN 4207.00000
2 AFG Afghanistan 1913 NaN 5730.00000
3 AFG Afghanistan 1950 1156.0000 8150.00000
4 AFG Afghanistan 1951 1170.0000 8284.00000
... ... ... ... ... ...
21677 ZWE Zimbabwe 2014 1594.0000 13313.99205
21678 ZWE Zimbabwe 2015 1560.0000 13479.13812
21679 ZWE Zimbabwe 2016 1534.0000 13664.79457
21680 ZWE Zimbabwe 2017 1582.3662 13870.26413
21681 ZWE Zimbabwe 2018 1611.4052 14096.61179

21682 rows × 5 columns

This dataset is in long format. Also, notice that the year is not an integer. Let's correct this

In [16]:
maddison_new['year'] = maddison_new.year.astype(int)
maddison_new
Out[16]:
countrycode country year gdppc pop
0 AFG Afghanistan 1820 NaN 3280.00000
1 AFG Afghanistan 1870 NaN 4207.00000
2 AFG Afghanistan 1913 NaN 5730.00000
3 AFG Afghanistan 1950 1156.0000 8150.00000
4 AFG Afghanistan 1951 1170.0000 8284.00000
... ... ... ... ... ...
21677 ZWE Zimbabwe 2014 1594.0000 13313.99205
21678 ZWE Zimbabwe 2015 1560.0000 13479.13812
21679 ZWE Zimbabwe 2016 1534.0000 13664.79457
21680 ZWE Zimbabwe 2017 1582.3662 13870.26413
21681 ZWE Zimbabwe 2018 1611.4052 14096.61179

21682 rows × 5 columns

Original Maddison Data

Now, let's download, save and read the original Maddison database. Since the original file is an excel file with different data on each sheet, it will require us to use a different method to get all the data.

In [17]:
if not os.path.exists(pathout + 'Maddison_original.xlsx'):
    import urllib
    dataurl = "https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/md2010_horizontal.xlsx"
    urllib.request.urlretrieve(dataurl, pathout + 'Maddison_original.xlsx')

Some data munging

This dataset is not very nicely structured for importing, as you can see if you open it in Excel. I suggest you do so, so that you can better see what is going on. Notice that the first two rows really have no data. Also, every second column is empty. Moreover, there are a few empty rows. Let's import the data and clean it so we can plot and analyse it better.

In [18]:
maddison_old_pop = pd.read_excel(pathout + 'Maddison_original.xlsx', sheet_name="Population", skiprows=2)
maddison_old_pop
Out[18]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 2002 2003 2004 2005 2006 2007 2008 2009 Unnamed: 201 2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 NaN 700.0 NaN 2000.0 NaN 2500.0 NaN 2500.0 ... 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 NaN 8120.000
2 Belgium 300.0 NaN 400.0 NaN 1400.0 NaN 1600.0 NaN 2000.0 ... 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 NaN 10409.000
3 Denmark 180.0 NaN 360.0 NaN 600.0 NaN 650.0 NaN 700.0 ... 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 NaN 5730.488
4 Finland 20.0 NaN 40.0 NaN 300.0 NaN 400.0 NaN 400.0 ... 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 NaN 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. NaN 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. NaN 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. NaN 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. NaN 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. NaN 2308.205

278 rows × 203 columns

In [19]:
maddison_old_gdppc = pd.read_excel(pathout + 'Maddison_original.xls', sheet_name="PerCapita GDP", skiprows=2)
maddison_old_gdppc
Out[19]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 NaN 425.000000 NaN 707 NaN 837.200000 NaN 993.200000 ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 NaN 425.000000 NaN 875 NaN 975.625000 NaN 1144.000000 ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 NaN 400.000000 NaN 738.333333 NaN 875.384615 NaN 1038.571429 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 NaN 400.000000 NaN 453.333333 NaN 537.500000 NaN 637.500000 ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 NaN 424.767802 NaN 413.709504 NaN 422.071584 NaN 420.628684 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 NaN 453.402162 NaN 566.389464 NaN 595.783856 NaN 614.853602 ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 200 columns

Let's start by renaming the first column, which has the region/country names

In [20]:
maddison_old_pop.rename(columns={'Unnamed: 0':'Country'}, inplace=True)
maddison_old_gdppc.rename(columns={'Unnamed: 0':'Country'}, inplace=True)

Now let's drop all the columns that do not have data

In [21]:
maddison_old_pop = maddison_old_pop[[col for col in maddison_old_pop.columns if str(col).startswith('Unnamed')==False]]
maddison_old_gdppc = maddison_old_gdppc[[col for col in maddison_old_gdppc.columns if str(col).startswith('Unnamed')==False]]

Now, let's change the name of the columns so they reflect the underlying variable

In [22]:
maddison_old_pop.columns = ['Country'] + ['pop_'+str(col) for col in maddison_old_pop.columns[1:]]
maddison_old_gdppc.columns = ['Country'] + ['gdppc_'+str(col) for col in maddison_old_gdppc.columns[1:]]
In [23]:
maddison_old_pop
Out[23]:
Country pop_1 pop_1000 pop_1500 pop_1600 pop_1700 pop_1820 pop_1821 pop_1822 pop_1823 ... pop_2001 pop_2002 pop_2003 pop_2004 pop_2005 pop_2006 pop_2007 pop_2008 pop_2009 pop_2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 700.0 2000.0 2500.0 2500.0 3369.0 3386.0 3402.0 3419.0 ... 8131.690 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 8120.000
2 Belgium 300.0 400.0 1400.0 1600.0 2000.0 3434.0 3464.0 3495.0 3526.0 ... 10291.679 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 10409.000
3 Denmark 180.0 360.0 600.0 650.0 700.0 1155.0 1167.0 1179.0 1196.0 ... 5355.826 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 5730.488
4 Finland 20.0 40.0 300.0 400.0 400.0 1169.0 1186.0 1202.0 1219.0 ... 5180.309 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 431.170 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 177.562 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 418.454 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 732.570 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1759.756 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. 2308.205

278 rows × 197 columns

In [24]:
maddison_old_gdppc
Out[24]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 425.000000 707 837.200000 993.200000 1218.165628 NaN NaN NaN ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 425.000000 875 975.625000 1144.000000 1318.870122 NaN NaN NaN ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 400.000000 738.333333 875.384615 1038.571429 1273.593074 1320.479863 1326.547922 1307.692308 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 400.000000 453.333333 537.500000 637.500000 781.009410 NaN NaN NaN ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 424.767802 413.709504 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 453.402162 566.389464 595.783856 614.853602 665.735330 NaN NaN NaN ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 195 columns

Let's choose the rows that hold the aggregates by region for the main regions of the world.

In [25]:
gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.apply(lambda x: str(x).upper().find('TOTAL')!=-1)].reset_index(drop=True)
gdppc = gdppc.dropna(subset=['gdppc_1'])
gdppc = gdppc.loc[2:]
gdppc['Country'] = gdppc.Country.str.replace('Total', '').str.replace('Countries', '').str.replace('\d+', '').str.replace('European', 'Europe').str.strip()
gdppc = gdppc.loc[gdppc.Country.apply(lambda x: x.find('USSR')==-1 and  x.find('West Asian')==-1)].reset_index(drop=True)
gdppc
/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_10270/1141553257.py:4: FutureWarning: The default value of regex will change from True to False in a future version.
  gdppc['Country'] = gdppc.Country.str.replace('Total', '').str.replace('Countries', '').str.replace('\d+', '').str.replace('European', 'Europe').str.strip()
Out[25]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe 576.167665 427.425665 771.093805 887.906964 993.456911 1194.184683 NaN NaN NaN ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 NaN NaN NaN ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 NaN NaN NaN ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457143 437.558140 526.639004 691.060678 NaN NaN NaN ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.4179 573.550859 571.605276 580.626115 NaN NaN NaN ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.709504 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 195 columns

Let's drop missing values

In [26]:
gdppc = gdppc.dropna(axis=1, how='any')
gdppc
Out[26]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1870 gdppc_1900 gdppc_1913 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe 576.167665 427.425665 771.093805 887.906964 993.456911 1194.184683 1953.068150 2884.661525 3456.576178 ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 2419.152411 4014.870040 5232.816582 ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 936.628265 1437.944586 1694.879668 ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457143 437.558140 526.639004 691.060678 676.005331 1113.071149 1494.431922 ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.4179 573.550859 571.605276 580.626115 553.459947 637.615593 695.131881 ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.709504 422.071584 420.628684 419.755914 500.011054 601.236364 637.433138 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 70 columns

Let's convert from wide to long format

In [27]:
gdppc = pd.wide_to_long(gdppc, ['gdppc_'], i='Country', j='year').reset_index()
gdppc
Out[27]:
Country year gdppc_
0 Western Europe 1 576.167665
1 Western Offshoots 1 400.0
2 East Europe 1 411.789474
3 Latin America 1 400.0
4 Asia 1 455.671021
... ... ... ...
409 Western Offshoots 2008 30151.80588
410 East Europe 2008 8568.967581
411 Latin America 2008 6973.134656
412 Asia 2008 5611.198564
413 Africa 2008 1780.265474

414 rows × 3 columns

Plotting

We can now plot the data. Let's try two different ways. The first uses the plot function from pandas. The second uses the package seaborn, which improves on the capabilities of matplotlib. The main difference is how the data needs to be organized. Of course, these are not the only ways to plot and we can try others.

In [28]:
import matplotlib as mpl
import seaborn as sns
# Setup seaborn
sns.set()

Let's pivot the table so that each region is a column and each row is a year. This will allow us to plot using the plot function of the pandas DataFrame.

In [29]:
gdppc2 = gdppc.pivot_table(index='year',columns='Country',values='gdppc_',aggfunc='sum')
gdppc2
Out[29]:
Country Africa Asia East Europe Latin America Western Europe Western Offshoots
year
1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000
1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000
1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000
1600 422.071584 573.550859 548.023599 437.558140 887.906964 400.000000
1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000
... ... ... ... ... ... ...
2004 1558.099461 4661.517477 6942.136596 6063.068969 20199.220700 28807.845958
2005 1603.686517 4900.563281 7261.721015 6265.525702 20522.238008 29415.399334
2006 1663.531318 5187.253152 7730.097570 6530.533583 21087.304789 29922.741918
2007 1724.226776 5408.383588 8192.881904 6783.869986 21589.011346 30344.425293
2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880

69 rows × 6 columns

Ok. Let's plot using the pandas plot function.

In [30]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

# Set the size of the figure and get a figure and axis object
fig, ax = plt.subplots(figsize=(30,20))
# Plot using the axis ax and colormap my_cmap
gdppc2.loc[1800:].plot(ax=ax, linewidth=8, cmap=my_cmap)
# Change options of axes, legend
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(prop={'size': 40}).set_title("Region", prop = {'size':40})
# Label axes
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)
Out[30]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")
In [31]:
fig
Out[31]:

Now, let's use seaborn

In [32]:
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)
# Plot
fig, ax = plt.subplots(figsize=(30,20))
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[gdppc.year>=1800].reset_index(drop=True), alpha=1, lw=8, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=False)
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)
Out[32]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")
In [33]:
fig
Out[33]:

Nice! Basically the same plot. But we can do better! Let's use seaborn again, but this time use different markers for each region, and let's use only a subset of the data so that it looks better. Also, let's export the figure so we can use it in our slides.

In [34]:
# Create category for hue
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1800) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1820-2010.pdf', dpi=300, bbox_inches='tight')
In [35]:
fig
Out[35]:

Let's create the same plot using the updated data from the Maddison Project. Here we have less years, but the picture is similar.

In [36]:
maddison_new_region['Region'] = maddison_new_region.region_name

mycolors2 = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71", "orange", "b"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year.apply(lambda x: x in [1870, 1890, 1913, 1929,1950, 2016])) | ((maddison_new_region.year>1950) & (maddison_new_region.year.apply(lambda x: np.mod(x,10)==0)))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (2011 Int\'l US$)')
plt.savefig(pathgraphs + 'y1870-2016.pdf', dpi=300, bbox_inches='tight')
In [37]:
fig
Out[37]:

Let's show the evolution starting from other periods.

In [38]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1700) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'take-off-1700-2010.pdf', dpi=300, bbox_inches='tight')
In [39]:
fig
Out[39]:
In [40]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1500) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1500-2010.pdf', dpi=300, bbox_inches='tight')
In [41]:
fig
Out[41]:
In [42]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1000) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1000-2010.pdf', dpi=300, bbox_inches='tight')
In [43]:
fig
Out[43]:
In [44]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=0) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1-2010.pdf', dpi=300, bbox_inches='tight')
In [45]:
fig
Out[45]:

Let's plot the evolution of GDP per capita for the whole world

In [46]:
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc
world_gdppc['Region'] = world_gdppc.Country.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=world_gdppc.loc[(world_gdppc.year>=0) & (world_gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'W-y1-2010.pdf', dpi=300, bbox_inches='tight')
In [47]:
fig
Out[47]:

Let's plot $log(GDPpc)$ during the modern era when we have sustained economic growth

In [48]:
gdppc['lgdppc'] = np.log(gdppc.gdppc_)

# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='lgdppc', hue='Region', data=gdppc.loc[(gdppc.year>=1950)].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('Log[GDP per capita (1990 Int\'l US$)]')
plt.savefig(pathgraphs + 'sg1950-2000.pdf', dpi=300, bbox_inches='tight')
In [49]:
fig
Out[49]:
In [50]:
mycolors2 = ["#34495e", "#2ecc71"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year>=1870) & (maddison_new_region.region.apply(lambda x: x in ['we', 'wo']))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=['D', '^'],)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1f}'))
ax.set_yscale('log')
ax.set_yticks([500, 5000, 50000])
ax.get_yaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$, log-scale)')
plt.savefig(pathgraphs + 'sg1870-2000.pdf', dpi=300, bbox_inches='tight')

Growth Rates

Let's select a subsample of periods between 1CE and 2008 and compute the growth rate per year of income per capita in the world. We will select the sample of years we want using the loc operator and then use the shift operator to get data from the previous observation.

In [51]:
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 2008]).astype(int)
world_gdppc
Out[51]:
Country year gdppc_ Region mysample
0 World Average 1 466.752281 World Average 1
1 World Average 1000 453.402162 World Average 1
2 World Average 1500 566.389464 World Average 1
3 World Average 1600 595.783856 World Average 0
4 World Average 1700 614.853602 World Average 0
... ... ... ... ... ...
189 World Average 2004 6738.281333 World Average 0
190 World Average 2005 6960.031035 World Average 0
191 World Average 2006 7238.383483 World Average 0
192 World Average 2007 7467.648232 World Average 0
193 World Average 2008 7613.922924 World Average 1

69 rows × 5 columns

In [52]:
maddison_growth = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth['year_prev'] = maddison_growth['year'] - maddison_growth['year'].shift(1)
maddison_growth['growth'] = ((maddison_growth['gdppc_'] / maddison_growth['gdppc_'].shift(1)) ** (1/ maddison_growth.year_prev) -1)
maddison_growth['Period'] = maddison_growth['year'].astype(str).shift(1) + '-' + maddison_growth['year'].astype(str)
maddison_growth    
Out[52]:
Country year gdppc_ Region mysample year_prev growth Period
0 World Average 1 466.752281 World Average 1 NaN NaN NaN
1 World Average 1000 453.402162 World Average 1 999.0 -0.000029 1-1000
2 World Average 1500 566.389464 World Average 1 500.0 0.000445 1000-1500
3 World Average 1820 665.735330 World Average 1 320.0 0.000505 1500-1820
4 World Average 2008 7613.922924 World Average 1 188.0 0.013046 1820-2008
In [53]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues", maddison_growth.shape[0]+4)[4:])
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
#handles, labels = ax.get_legend_handles_labels()
#ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate of Income per capita')
plt.savefig(pathgraphs + 'W-g1-2010.pdf', dpi=300, bbox_inches='tight')
In [54]:
fig
Out[54]:

Growth of population and income (by regions)

In [55]:
# Growth rates gdppc
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = 'World'
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)
print(maddison_growth_gdppc)
         Country  year       gdppc_ Region  mysample  year_prev    growth     Period
0  World Average     1   466.752281  World         1        NaN       NaN        NaN
1  World Average  1000   453.402162  World         1      999.0 -0.000029     1-1000
2  World Average  1500   566.389464  World         1      500.0  0.000445  1000-1500
3  World Average  1820   665.735330  World         1      320.0  0.000505  1500-1820
4  World Average  1913  1524.430799  World         1       93.0  0.008948  1820-1913
In [56]:
# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country=='World Total']
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = 'World'
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
print(maddison_growth_pop)    
       Country  year          pop_ Region  mysample  year_prev    growth     Period
0  World Total     1  2.258200e+05  World         1        NaN       NaN        NaN
1  World Total  1000  2.673300e+05  World         1      999.0  0.000169     1-1000
2  World Total  1500  4.384280e+05  World         1      500.0  0.000990  1000-1500
3  World Total  1820  1.041708e+06  World         1      320.0  0.002708  1500-1820
4  World Total  1913  1.792925e+06  World         1       93.0  0.005856  1820-1913
In [57]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth
Out[57]:
Region Period GDPpc Population
1 World 1-1000 -0.000029 0.000169
2 World 1000-1500 0.000445 0.000990
3 World 1500-1820 0.000505 0.002708
4 World 1820-1913 0.008948 0.005856
In [58]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 
maddison_growth
Out[58]:
Region Period variable growth
0 World 1-1000 Income per capita -0.000029
1 World 1000-1500 Income per capita 0.000445
2 World 1500-1820 Income per capita 0.000505
3 World 1820-1913 Income per capita 0.008948
4 World 1-1000 Population 0.000169
5 World 1000-1500 Population 0.000990
6 World 1500-1820 Population 0.002708
7 World 1820-1913 Population 0.005856
In [59]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + 'W-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [60]:
fig
Out[60]:
In [61]:
# Growth rates gdppc
myregion = 'Western Offshoots'
fname = 'WO'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [62]:
fig
Out[62]:
In [63]:
# Growth rates gdppc
myregion = 'Western Europe'
fname = 'WE'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [64]:
fig
Out[64]:
In [65]:
# Growth rates gdppc
myregion = 'Latin America'
fname = 'LA'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [66]:
fig
Out[66]:
In [67]:
# Growth rates gdppc
myregion = 'Asia'
fname = 'AS'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [68]:
fig
Out[68]:
In [69]:
# Growth rates gdppc
myregion = 'Africa'
fname = 'AF'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')
In [70]:
fig
Out[70]:

Comparing richest to poorest region across time

Let's create a table that shows the GDP per capita levels for the 6 regions in the original data and compute the ratio of richest to poorest. Let's also plot it.

In [71]:
gdppc2['Richest-Poorest Ratio'] = gdppc2.max(axis=1) / gdppc2.min(axis=1)
gdp_ratio = gdppc2.loc[[1, 1000, 1500, 1700, 1820, 1870, 1913, 1940, 1960, 1980, 2000, 2008]].T
gdp_ratio = gdp_ratio.T.reset_index()
gdp_ratio['Region'] = 'Richest-Poorest'
gdp_ratio['Region'] = gdp_ratio.Region.astype('category')
In [72]:
gdp_ratio
Out[72]:
Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest
In [73]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Richest-Poorest Ratio', data=gdp_ratio, alpha=1, hue='Region', style='Region', dashes=False, markers=True, )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Richest-Poorest Ratio')
plt.savefig(pathgraphs + 'Richest-Poorest-Ratio.pdf', dpi=300, bbox_inches='tight')
In [74]:
fig
Out[74]:

Visualize as Table

In [75]:
gdp_ratio.style.format({
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1%}'.format, 1700: '{:,.1%}'.format, 
    1820: '{:,.1%}'.format, 1870: '{:,.1%}'.format, 1913: '{:,.1%}'.format, 1940: '{:,.1%}'.format, 
    1960: '{:,.1%}'.format, 1980: '{:,.1%}'.format, 2000: '{:,.1%}'.format, 2008: '{:,.1%}'.format, 
})
Out[75]:
Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest

Export table to LaTeX

Let's print the table as LaTeX code that can be copied and pasted in our slides or paper.

In [76]:
print(gdp_ratio.to_latex(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
}))
\begin{tabular}{lrrrrrrrrl}
\toprule
Country &  year &       Africa &         Asia &  East Europe &  Latin America &  Western Europe &  Western Offshoots &  Richest-Poorest Ratio &           Region \\
\midrule
0  &     1 &   472.352941 &   455.671021 &   411.789474 &     400.000000 &      576.167665 &         400.000000 &               1.440419 &  Richest-Poorest \\
1  &  1000 &   424.767802 &   469.961665 &   400.000000 &     400.000000 &      427.425665 &         400.000000 &               1.174904 &  Richest-Poorest \\
2  &  1500 &   413.709504 &   568.417900 &   496.000000 &     416.457143 &      771.093805 &         400.000000 &               1.927735 &  Richest-Poorest \\
3  &  1700 &   420.628684 &   571.605276 &   606.010638 &     526.639004 &      993.456911 &         476.000000 &               2.361838 &  Richest-Poorest \\
4  &  1820 &   419.755914 &   580.626115 &   683.160984 &     691.060678 &     1194.184683 &        1201.993477 &               2.863553 &  Richest-Poorest \\
5  &  1870 &   500.011054 &   553.459947 &   936.628265 &     676.005331 &     1953.068150 &        2419.152411 &               4.838198 &  Richest-Poorest \\
6  &  1913 &   637.433138 &   695.131881 &  1694.879668 &    1494.431922 &     3456.576178 &        5232.816582 &               8.209201 &  Richest-Poorest \\
7  &  1940 &   813.374613 &   893.992784 &  1968.706774 &    1932.850716 &     4554.045082 &        6837.844866 &               8.406760 &  Richest-Poorest \\
8  &  1960 &  1055.114678 &  1025.743131 &  3069.750386 &    3135.517072 &     6879.294331 &       10961.082848 &              10.685992 &  Richest-Poorest \\
9  &  1980 &  1514.558119 &  2028.654705 &  5785.933433 &    5437.924365 &    13154.033928 &       18060.162963 &              11.924378 &  Richest-Poorest \\
10 &  2000 &  1447.071701 &  3797.608955 &  5970.165085 &    5889.237351 &    19176.001655 &       27393.808035 &              18.930512 &  Richest-Poorest \\
11 &  2008 &  1780.265474 &  5611.198564 &  8568.967581 &    6973.134656 &    21671.774225 &       30151.805880 &              16.936691 &  Richest-Poorest \\
\bottomrule
\end{tabular}

/var/folders/q1/7qsx8kmj439d81kr4f_k_wbr0000gp/T/ipykernel_10270/1652423148.py:1: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  print(gdp_ratio.to_latex(formatters={
In [77]:
%%latex
\begin{tabular}{lrrrrrrrrrrrr}
\toprule
year &  1    &  1000 &  1500 &  1700 &    1820 &    1870 &    1913 &    1940 &     1960 &     1980 &     2000 &     2008 \\
Country               &       &       &       &       &         &         &         &         &          &          &          &          \\
\midrule
Africa                & 472.4 & 424.8 & 413.7 & 420.6 &   419.8 &   500.0 &   637.4 &   813.4 &  1,055.1 &  1,514.6 &  1,447.1 &  1,780.3 \\
Asia                  & 455.7 & 470.0 & 568.4 & 571.6 &   580.6 &   553.5 &   695.1 &   894.0 &  1,025.7 &  2,028.7 &  3,797.6 &  5,611.2 \\
East Europe           & 411.8 & 400.0 & 496.0 & 606.0 &   683.2 &   936.6 & 1,694.9 & 1,968.7 &  3,069.8 &  5,785.9 &  5,970.2 &  8,569.0 \\
Latin America         & 400.0 & 400.0 & 416.5 & 526.6 &   691.1 &   676.0 & 1,494.4 & 1,932.9 &  3,135.5 &  5,437.9 &  5,889.2 &  6,973.1 \\
Western Europe        & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 &  6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\
Western Offshoots     & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\
Richest-Poorest Ratio &   1.4 &   1.2 &   1.9 &   2.4 &     2.9 &     4.8 &     8.2 &     8.4 &     10.7 &     11.9 &     18.9 &     16.9 \\
\bottomrule
\end{tabular}
\begin{tabular}{lrrrrrrrrrrrr} \toprule year & 1 & 1000 & 1500 & 1700 & 1820 & 1870 & 1913 & 1940 & 1960 & 1980 & 2000 & 2008 \\ Country & & & & & & & & & & & & \\ \midrule Africa & 472.4 & 424.8 & 413.7 & 420.6 & 419.8 & 500.0 & 637.4 & 813.4 & 1,055.1 & 1,514.6 & 1,447.1 & 1,780.3 \\ Asia & 455.7 & 470.0 & 568.4 & 571.6 & 580.6 & 553.5 & 695.1 & 894.0 & 1,025.7 & 2,028.7 & 3,797.6 & 5,611.2 \\ East Europe & 411.8 & 400.0 & 496.0 & 606.0 & 683.2 & 936.6 & 1,694.9 & 1,968.7 & 3,069.8 & 5,785.9 & 5,970.2 & 8,569.0 \\ Latin America & 400.0 & 400.0 & 416.5 & 526.6 & 691.1 & 676.0 & 1,494.4 & 1,932.9 & 3,135.5 & 5,437.9 & 5,889.2 & 6,973.1 \\ Western Europe & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 & 6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\ Western Offshoots & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\ Richest-Poorest Ratio & 1.4 & 1.2 & 1.9 & 2.4 & 2.9 & 4.8 & 8.2 & 8.4 & 10.7 & 11.9 & 18.9 & 16.9 \\ \bottomrule \end{tabular}

Export Table to HTML

In [78]:
from IPython.display import display, HTML
display(HTML(gdp_ratio.to_html(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
})))
Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest

Take-off, industrialization and reversals

Industrialization per capita

Let's create a full dataframe inserting the data by hand. This is based on data from Bairoch, P., 1982. "International industrialization levels from 1750 to 1980". Journal of European Economic History, 11(2), p.269. for 1750-1913 the data comes from Table 9

image.png

In [79]:
industrialization = [['Developed Countries', 8, 8, 11, 16, 24, 35, 55],
                     ['Europe', 8, 8, 11, 17, 23, 33, 45],
                     ['Austria-Hungary', 7, 7, 8, 11, 15, 23, 32],
                     ['Belgium', 9, 10, 14, 28, 43, 56, 88],
                     ['France', 9, 9, 12, 20, 28, 39, 59],
                     ['Germany', 8, 8, 9, 15, 25, 52, 85],
                     ['Italy', 8, 8, 8, 10, 12, 17, 26],
                     ['Russia', 6, 6, 7, 8, 10, 15, 20],
                     ['Spain', 7, 7, 8, 11, 14, 19, 22],
                     ['Sweden', 7, 8, 9, 15, 24, 41, 67],
                     ['Switzerland', 7, 10, 16, 26, 39, 67, 87],
                     ['United Kingdom', 10, 16, 25, 64, 87, 100, 115],
                     ['Canada', np.nan, 5, 6, 7, 10, 24, 46],
                     ['United States', 4, 9, 14, 21, 38, 69, 126],
                     ['Japan', 7, 7, 7, 7, 9, 12, 20],
                     ['Third World', 7, 6, 6, 4, 3, 2, 2],
                     ['China', 8, 6, 6, 4, 4, 3, 3],
                     ['India', 7, 6, 6, 3, 2, 1, 2],
                     ['Brazil', np.nan, np.nan, np.nan, 4, 4, 5, 7],
                     ['Mexico', np.nan, np.nan, np.nan, 5, 4, 5, 7],
                     ['World', 7, 6, 7, 7, 9, 14, 21]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
industrialization = pd.DataFrame(industrialization, columns=['Country'] + ['y'+str(y) for y in years])

For 1913-1980 the data comes from Table 12

image.png

In [80]:
industrialization2 = [['Developed Countries', 55, 71, 81, 135, 194, 315, 344],
                      ['Market Economies', np.nan, 96, 105, 167, 222, 362, 387],
                      ['Europe', 45, 76, 94, 107, 166, 260, 280],
                      ['Belgium', 88, 116, 89, 117, 183, 291, 316],
                      ['France', 59, 82, 73, 95, 167, 259, 277],
                      ['Germany', 85, 101, 128, 144, 244, 366, 395],
                      ['Italy', 26, 39, 44, 61, 121, 194, 231],
                      ['Spain', 22, 28, 23, 31, 56, 144, 159],
                      ['Sweden', 67, 84, 135, 163, 262, 405, 409],
                      ['Switzerland', 87, 90, 88, 167, 259, 366, 354],
                      ['United Kingdom', 115, 122, 157, 210, 253, 341, 325],
                      ['Canada', 46, 82, 84, 185, 237, 370, 379],
                      ['United States', 126, 182, 167, 354, 393, 604, 629],
                      ['Japan', 20, 30, 51, 40, 113, 310, 353],
                      ['U.S.S.R.', 20, 20, 38, 73, 139, 222, 252],
                      ['Third World', 2, 3, 4, 5, 8, 14, 17],
                      ['India', 2, 3, 4, 6, 8, 14, 16],
                      ['Brazil', 7, 10, 10, 13, 23, 42, 55],
                      ['Mexico', 7, 9, 8, 12, 22, 36, 41],
                      ['China', 3, 4, 4, 5, 10, 18, 24],
                      ['World', 21, 28, 31 ,48, 66, 100, 103]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
industrialization2 = pd.DataFrame(industrialization2, columns=['Country'] + ['y'+str(y) for y in years])

Let's join both dataframes so we can plot the whole series.

In [81]:
industrialization = industrialization.merge(industrialization2)
industrialization
Out[81]:
Country y1750 y1800 y1830 y1860 y1880 y1900 y1913 y1928 y1938 y1953 y1963 y1973 y1980
0 Developed Countries 8.0 8.0 11.0 16 24 35 55 71 81 135 194 315 344
1 Europe 8.0 8.0 11.0 17 23 33 45 76 94 107 166 260 280
2 Belgium 9.0 10.0 14.0 28 43 56 88 116 89 117 183 291 316
3 France 9.0 9.0 12.0 20 28 39 59 82 73 95 167 259 277
4 Germany 8.0 8.0 9.0 15 25 52 85 101 128 144 244 366 395
5 Italy 8.0 8.0 8.0 10 12 17 26 39 44 61 121 194 231
6 Spain 7.0 7.0 8.0 11 14 19 22 28 23 31 56 144 159
7 Sweden 7.0 8.0 9.0 15 24 41 67 84 135 163 262 405 409
8 Switzerland 7.0 10.0 16.0 26 39 67 87 90 88 167 259 366 354
9 United Kingdom 10.0 16.0 25.0 64 87 100 115 122 157 210 253 341 325
10 Canada NaN 5.0 6.0 7 10 24 46 82 84 185 237 370 379
11 United States 4.0 9.0 14.0 21 38 69 126 182 167 354 393 604 629
12 Japan 7.0 7.0 7.0 7 9 12 20 30 51 40 113 310 353
13 Third World 7.0 6.0 6.0 4 3 2 2 3 4 5 8 14 17
14 China 8.0 6.0 6.0 4 4 3 3 4 4 5 10 18 24
15 India 7.0 6.0 6.0 3 2 1 2 3 4 6 8 14 16
16 Brazil NaN NaN NaN 4 4 5 7 10 10 13 23 42 55
17 Mexico NaN NaN NaN 5 4 5 7 9 8 12 22 36 41
18 World 7.0 6.0 7.0 7 9 14 21 28 31 48 66 100 103

Let's convert to long format and plot the evolution of industrialization across regions and groups of countries.

In [82]:
industrialization = pd.wide_to_long(industrialization, ['y'], i='Country', j='year').reset_index()
industrialization.rename(columns={'y':'Industrialization'}, inplace=True)
In [83]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')
In [84]:
fig
Out[84]:
In [85]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

industrialization['dev_level'] = industrialization.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:11],
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev.pdf', dpi=300, bbox_inches='tight')
In [86]:
fig
Out[86]:
In [87]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[11:],
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-NonDev.pdf', dpi=300, bbox_inches='tight')
In [88]:
fig
Out[88]:
In [89]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[
                 (industrialization.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (industrialization.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:2],
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-UK-IND.pdf', dpi=300, bbox_inches='tight')
In [90]:
fig
Out[90]:

Manufacturing

Let's use data from the same source to explore what happened to the share of manufacturing across regions.

image.png

image.png

In [91]:
# 1750-1913
manufacturing = [['Developed Countries', 27.0, 32.3, 39.5, 63.4, 79.1, 89.0, 92.5],
                 ['Europe', 23.2, 28.1, 34.2, 53.2, 61.3, 62.0, 56.6],
                 ['Austria-Hungary', 2.9, 3.2, 3.2, 4.2, 4.4, 4.7, 4.4],
                 ['Belgium', 0.3, 0.5, 0.7, 1.4, 1.8, 1.7, 1.8],
                 ['France', 4.0, 4.2, 5.2, 7.9, 7.8, 6.8, 6.1],
                 ['Germany', 2.9, 3.5, 3.5, 4.9, 8.5, 13.2, 14.8],
                 ['Italy', 2.4, 2.5, 2.3, 2.5, 2.5, 2.5, 2.4],
                 ['Russia', 5.0, 5.6, 5.6, 7.0, 7.6, 8.8, 8.2],
                 ['Spain', 1.2, 1.5, 1.5, 1.8, 1.8, 1.6, 1.2],
                 ['Sweden', 0.3, 0.3, 0.4, 0.6, 0.8, 0.9, 1.0],
                 ['Switzerland', 0.1, 0.3, 0.4, 0.7, 0.8, 1.0, 0.9],
                 ['United Kingdom', 1.9, 4.3, 9.5, 19.9, 22.9, 18.5, 13.6],
                 ['Canada', np.nan, np.nan, 0.1, 0.3, 0.4, 0.6, 0.9],
                 ['United States', 0.1, 0.8, 2.4, 7.2, 14.7, 23.6, 32.0],
                 ['Japan', 3.8, 3.5, 2.8, 2.6, 2.4, 2.4, 2.7],
                 ['Third World', 73.0, 67.7, 60.5, 36.6, 20.9, 11.0, 7.5],
                 ['China', 32.8, 33.3, 29.8, 19.7, 12.5, 6.2, 3.6],
                 ['India', 24.5, 19.7, 17.6, 8.6, 2.8, 1.7, 1.4],
                 ['Brazil', np.nan, np.nan, np.nan, 0.4, 0.3, 0.4, 0.5],
                 ['Mexico', np.nan, np.nan, np.nan, 0.4, 0.3, 0.3, 0.3]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
manufacturing = pd.DataFrame(manufacturing, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
manufacturing2 = [['Developed Countries', 92.5, 92.8, 92.8, 93.5, 91.5, 90.1, 88.0],
                  ['Market Economies', 76.7, 80.3, 76.5, 77.5, 70.5, 70.0, 66.9],
                  ['Europe', 40.8, 35.4, 37.3, 26.1, 26.5, 24.5, 22.9],
                  ['Belgium', 1.8, 1.7, 1.1, 0.8, 0.8, 0.7, 0.7],
                  ['France', 6.1, 6.0, 4.4, 3.2, 3.8, 3.5, 3.3],
                  ['Germany', 14.8, 11.6, 12.7, 5.9, 6.4, 5.9, 5.3],
                  ['Italy', 2.4, 2.7, 2.8, 2.3, 2.9, 2.9, 2.9],
                  ['Spain', 1.2, 1.1, 0.8, 0.7, 0.8, 1.3, 1.4],
                  ['Sweden', 1.0, 0.9, 1.2, 0.9, 0.9, 0.9, 0.8],
                  ['Switzerland', 0.9, 0.7, 0.5, 0.7, 0.7, 0.6, 0.5],
                  ['United Kingdom', 13.6, 9.9, 10.7, 8.4, 6.4, 4.9, 4.0],
                  ['Canada', 0.9, 1.5, 1.4, 2.2, 2.1, 2.1, 2.0],
                  ['United States', 32.0, 39.3, 31.4, 44.7, 35.1, 33.0, 31.5],
                  ['Japan', 2.7, 3.3, 5.2, 2.9, 5.1, 8.8, 9.1],
                  ['U.S.S.R.', 8.2, 5.3, 9.0, 10.7, 14.2, 14.4, 14.8],
                  ['Third World', 7.5, 7.2, 7.2, 6.5, 8.5, 9.9, 12.0],
                  ['India', 1.4, 1.9, 2.4, 1.7, 1.8, 2.1, 2.3],
                  ['Brazil', 0.5, 0.6, 0.6, 0.6, 0.8, 1.1, 1.4],
                  ['Mexico', 0.3, 0.2, 0.2, 0.3, 0.4, 0.5, 0.6],
                  ['China', 3.6, 3.4, 3.1, 2.3, 3.5, 3.9, 5.0]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
manufacturing2 = pd.DataFrame(manufacturing2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
manufacturing = manufacturing.merge(manufacturing2)
manufacturing = pd.wide_to_long(manufacturing, ['y'], i='Country', j='year').reset_index()
manufacturing.rename(columns={'y':'manufacturing'}, inplace=True)
manufacturing['manufacturing'] = manufacturing.manufacturing / 100
manufacturing
Out[91]:
Country year manufacturing
0 Developed Countries 1750 0.270
1 Belgium 1750 0.003
2 France 1750 0.040
3 Germany 1750 0.029
4 Italy 1750 0.024
... ... ... ...
216 Third World 1980 0.120
217 China 1980 0.050
218 India 1980 0.023
219 Brazil 1980 0.014
220 Mexico 1980 0.006

221 rows × 3 columns

In [92]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')
In [93]:
fig
Out[93]:
In [94]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

manufacturing['dev_level'] = manufacturing.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:11],
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev.pdf', dpi=300, bbox_inches='tight')
In [95]:
fig
Out[95]:
In [96]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[11:],
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-NonDev.pdf', dpi=300, bbox_inches='tight')
In [97]:
fig
Out[97]:
In [98]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[
                 (manufacturing.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (manufacturing.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:2],
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'manufacturing-UK-IND.pdf', dpi=300, bbox_inches='tight')
In [99]:
fig
Out[99]:

Industrial Potential

We can also explore the industrial potantial of these countries.

image.png

image.png

In [100]:
# 1750-1913
indpotential = [['Developed Countries', 34.4, 47.4, 72.9, 143.2, 253.1, 481.2, 863.0,],
                ['Europe', 29.6, 41.2, 63.0, 120.3, 196.2, 335.4, 527.8,],
                ['Austria-Hungary', 3.7, 4.8, 5.8, 9.5, 14.0, 25.6, 40.7,],
                ['Belgium', 0.4, 0.7, 1.3, 3.1, 5.7, 9.2, 16.3,],
                ['France', 5.0, 6.2, 9.5, 17.9, 25.1, 36.8, 57.3,],
                ['Germany', 3.7, 5.2, 6.5, 11.1, 27.4, 71.2, 137.7,],
                ['Italy', 3.1, 3.7, 4.2, 5.7, 8.1, 13.6, 22.5,],
                ['Russia', 6.4, 8.3, 10.3, 15.8, 24.5, 47.5, 76.6,],
                ['Spain', 1.6, 2.1, 2.7, 4.0, 5.8, 8.5, 11.0,],
                ['Sweden', 0.3, 0.5, 0.6, 1.4, 2.6, 5.0, 9.0,],
                ['Switzerland', 0.2, 0.4, 0.8, 1.6, 2.6, 5.4, 8.0,],
                ['United Kingdom', 2.4, 6.2, 17.5, 45.0, 73.3, 100.0, 127.2,],
                ['Canada', np.nan, np.nan, 0.1, 0.6, 1.4, 3.2, 8.7,],
                ['United States', 0.1, 1.1, 4.6, 16.2, 46.9, 127.8, 298.1,],
                ['Japan', 4.8, 5.1, 5.2, 5.8, 7.6, 13.0, 25.1,],
                ['Third World', 92.9, 99.4, 111.5, 82.7, 67.0, 59.6, 69.5,],
                ['China', 41.7, 48.8, 54.9, 44.1, 39.9, 33.5, 33.3,],
                ['India', 31.2, 29.0, 32.5, 19.4, 8.8, 9.3, 13.1,],
                ['Brazil', np.nan, np.nan, np.nan, 0.9, 0.9, 2.1, 4.3,],
                ['Mexico', np.nan, np.nan, np.nan, 0.9, 0.8, 1.7, 2.7,],
                ['World', 127.3, 146.9, 184.4, 225.9, 320.1, 540.8, 932.5,]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
indpotential = pd.DataFrame(indpotential, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
indpotential2 = [['Developed Countries', 863, 1259, 1562, 2870, 4699, 8432, 9718],
                 ['Market Economies', 715, 1089, 1288, 2380, 3624, 6547, 7388],
                 ['Europe', 380, 480, 629, 801, 1361, 2290, 2529],
                 ['Belgium', 16, 22, 18, 25, 41, 69, 76],
                 ['France', 57, 82, 74, 98, 194, 328, 362],
                 ['Germany', 138, 158, 214, 180, 330, 550, 590],
                 ['Italy', 23, 37, 46, 71, 150, 258, 319],
                 ['Spain', 11, 16, 14, 22, 43, 122, 156],
                 ['Sweden', 9, 12, 21, 28, 48, 80, 83],
                 ['Switzerland', 8, 9, 9, 20, 37, 57, 54],
                 ['United Kingdom', 127, 135, 181, 258, 330, 462, 441],
                 ['Canada', 9, 20, 23, 66, 109, 199, 220],
                 ['United States', 298, 533, 528, 1373, 1804, 3089, 3475],
                 ['Japan', 25, 45, 88, 88, 264, 819, 1001],
                 ['U.S.S.R.', 77, 72, 152, 328, 760, 1345, 1630],
                 ['Third World', 70, 98, 122, 200, 439, 927, 1323],
                 ['India', 13, 26, 40, 52, 91, 194, 254],
                 ['Brazil', 4, 8, 10, 18, 42, 102, 159],
                 ['Mexico', 3, 3, 4, 9, 21, 47, 68],
                 ['China', 33, 46, 52, 71, 178, 369, 553],
                 ['World', 933, 1356, 1684, 3070, 5138, 9359, 11041]]

years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
indpotential2 = pd.DataFrame(indpotential2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
indpotential = indpotential.merge(indpotential2[indpotential2.columns.difference(['y1913'])])
indpotential = pd.wide_to_long(indpotential, ['y'], i='Country', j='year').reset_index()
indpotential.rename(columns={'y':'indpotential'}, inplace=True)
indpotential
Out[100]:
Country year indpotential
0 Developed Countries 1750 34.4
1 Europe 1750 29.6
2 Belgium 1750 0.4
3 France 1750 5.0
4 Germany 1750 3.7
... ... ... ...
242 China 1980 553.0
243 India 1980 254.0
244 Brazil 1980 159.0
245 Mexico 1980 68.0
246 World 1980 11041.0

247 rows × 3 columns

In [101]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')
In [102]:
fig
Out[102]:
In [103]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

indpotential['dev_level'] = indpotential.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:11],
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev.pdf', dpi=300, bbox_inches='tight')
In [104]:
fig
Out[104]:
In [105]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[11:],
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-NonDev.pdf', dpi=300, bbox_inches='tight')
In [106]:
fig
Out[106]:
In [107]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[
                 (indpotential.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (indpotential.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers[:2],
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='')
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-UK-IND.pdf', dpi=300, bbox_inches='tight')
In [108]:
fig
Out[108]:

Persistence

Let's explore the persistence of economic development since 1950. To do so, let's get the Penn World Table and World Bank Data.

Penn World Table

Let's start by importing the data from the Penn World Tables

In [109]:
try:
    pwt_xls = pd.read_excel(pathout + 'pwt100.xlsx',encoding='utf-8')
    pwt = pd.read_stata(pathout + 'pwt100.dta')    
except:
    pwt_xls = pd.read_excel('https://www.rug.nl/ggdc/docs/pwt100.xlsx',sheet_name=1)
    pwt = pd.read_stata('https://www.rug.nl/ggdc/docs/pwt100.dta')
    pwt_xls.to_excel(pathout + 'pwt100.xlsx', index=False, encoding='utf-8')
    pwt.to_stata(pathout + 'pwt100.dta', write_index=False, version=117)
    
# Get labels of variables
pwt_labels = pd.io.stata.StataReader(pathout + 'pwt100.dta').variable_labels()

The excel file let's us know the defintion of the variables, while the Stata file has the data (of course the excel file also has the data). For some reason the original Stata file does not seem to have labels!

In [110]:
pwt_labels
Out[110]:
{'countrycode': '',
 'country': '',
 'currency_unit': '',
 'year': '',
 'rgdpe': '',
 'rgdpo': '',
 'pop': '',
 'emp': '',
 'avh': '',
 'hc': '',
 'ccon': '',
 'cda': '',
 'cgdpe': '',
 'cgdpo': '',
 'cn': '',
 'ck': '',
 'ctfp': '',
 'cwtfp': '',
 'rgdpna': '',
 'rconna': '',
 'rdana': '',
 'rnna': '',
 'rkna': '',
 'rtfpna': '',
 'rwtfpna': '',
 'labsh': '',
 'irr': '',
 'delta': '',
 'xr': '',
 'pl_con': '',
 'pl_da': '',
 'pl_gdpo': '',
 'i_cig': '',
 'i_xm': '',
 'i_xr': '',
 'i_outlier': '',
 'i_irr': '',
 'cor_exp': '',
 'statcap': '',
 'csh_c': '',
 'csh_i': '',
 'csh_g': '',
 'csh_x': '',
 'csh_m': '',
 'csh_r': '',
 'pl_c': '',
 'pl_i': '',
 'pl_g': '',
 'pl_x': '',
 'pl_m': '',
 'pl_n': '',
 'pl_k': ''}
In [111]:
pwt_xls
Out[111]:
Variable name Variable definition
0 Identifier variables NaN
1 countrycode 3-letter ISO country code
2 country Country name
3 currency_unit Currency unit
4 year Year
... ... ...
62 pl_g Price level of government consumption, price ...
63 pl_x Price level of exports, price level of USA GDP...
64 pl_m Price level of imports, price level of USA GDP...
65 pl_n Price level of the capital stock, price level ...
66 pl_k Price level of the capital services, price lev...

67 rows × 2 columns

In [112]:
pwt
Out[112]:
countrycode country currency_unit year rgdpe rgdpo pop emp avh hc ... csh_x csh_m csh_r pl_c pl_i pl_g pl_x pl_m pl_n pl_k
0 ABW Aruba Aruban Guilder 1950 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 ABW Aruba Aruban Guilder 1951 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 ABW Aruba Aruban Guilder 1952 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 ABW Aruba Aruban Guilder 1953 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 ABW Aruba Aruban Guilder 1954 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12805 ZWE Zimbabwe US Dollar 2015 40141.617188 39798.644531 13.814629 6.393752 NaN 2.584653 ... 0.140172 -0.287693 -0.051930 0.479228 0.651287 0.541446 0.616689 0.533235 0.422764 1.533909
12806 ZWE Zimbabwe US Dollar 2016 41875.203125 40963.191406 14.030331 6.504374 NaN 2.616257 ... 0.131920 -0.251232 -0.016258 0.470640 0.651027 0.539631 0.619789 0.519718 0.416510 1.491724
12807 ZWE Zimbabwe US Dollar 2017 44672.175781 44316.742188 14.236595 6.611773 NaN 2.648248 ... 0.126722 -0.202827 -0.039897 0.473560 0.639560 0.519956 0.619739 0.552042 0.415592 1.514525
12808 ZWE Zimbabwe US Dollar 2018 44325.109375 43420.898438 14.438802 6.714952 NaN 2.680630 ... 0.144485 -0.263658 -0.020791 0.543757 0.655473 0.529867 0.641361 0.561526 0.425143 1.590120
12809 ZWE Zimbabwe US Dollar 2019 42296.062500 40826.570312 14.645468 6.831017 NaN 2.713408 ... 0.213562 -0.270959 -0.089798 0.494755 0.652439 0.500927 0.487763 0.430082 0.420675 1.384068

12810 rows × 52 columns

In [113]:
# Describe the data
pwt.describe()
Out[113]:
year rgdpe rgdpo pop emp avh hc ccon cda cgdpe ... csh_x csh_m csh_r pl_c pl_i pl_g pl_x pl_m pl_n pl_k
count 12810.000000 1.039900e+04 1.039900e+04 10399.000000 9529.000000 3492.000000 8637.000000 1.039900e+04 1.039900e+04 1.039900e+04 ... 10399.000000 10399.000000 10399.000000 10399.000000 10399.000000 10399.000000 10399.000000 10399.000000 10314.000000 7095.000000
mean 1984.500000 3.048523e+05 3.070802e+05 30.962982 14.171166 1986.923200 2.087200 2.249465e+05 3.049463e+05 3.061396e+05 ... 0.229317 -0.300829 0.017791 0.370850 0.423964 0.345635 0.409820 0.403422 0.359145 1.334313
std 20.205986 1.214332e+06 1.218457e+06 116.189454 58.056976 284.003338 0.727413 8.882342e+05 1.236096e+06 1.226056e+06 ... 0.266793 0.640212 0.216200 0.424091 0.635708 0.408015 0.196283 0.202566 0.435645 1.941830
min 1950.000000 2.036377e+01 2.765232e+01 0.004425 0.001200 1380.607643 1.007038 1.604856e+01 2.176663e+01 2.026185e+01 ... -1.937363 -23.237627 -12.568965 0.015589 0.006002 0.009270 0.007354 0.020806 0.012977 0.064379
25% 1967.000000 6.801782e+03 7.191773e+03 1.579663 0.775101 1788.478805 1.450483 5.892980e+03 7.194495e+03 6.689781e+03 ... 0.067027 -0.379584 -0.025204 0.171226 0.191958 0.116110 0.237652 0.240777 0.164562 0.650612
50% 1984.500000 3.031913e+04 3.084435e+04 6.150688 2.856044 1972.355973 1.987572 2.465417e+04 3.134864e+04 3.031841e+04 ... 0.140116 -0.200254 0.000326 0.306258 0.377943 0.244554 0.443856 0.452949 0.295903 0.955435
75% 2002.000000 1.559740e+05 1.587386e+05 19.934229 8.266107 2168.035042 2.674011 1.133967e+05 1.532297e+05 1.559242e+05 ... 0.300332 -0.102694 0.044529 0.484549 0.557087 0.452951 0.557034 0.541063 0.447797 1.415420
max 2019.000000 2.086051e+07 2.059584e+07 1433.783686 799.306641 3039.794005 4.351568 1.682624e+07 2.138355e+07 2.079136e+07 ... 3.523480 32.874020 7.598285 23.122841 34.444988 18.420809 2.056070 4.990355 20.649197 31.932846

8 rows × 44 columns

Computing $\log$ GDP per capita

Now, we can create new variables, transform and plot the data

To compute the $log$ of income per capita (GDPpc), the first thing we need is to know the name of the column that contains the GDPpc data in the dataframe. To do this, let's find among the variables those whic in their description have the word capita.

In [114]:
pwt_xls.columns
Out[114]:
Index(['Variable name', 'Variable definition'], dtype='object')

To be able to read the definitions better, let's tell pandas to show us more content.

In [115]:
pd.set_option("display.max_columns", 20)
pd.set_option('display.max_rows', 50)
pd.set_option('display.width', 1000)
#pd.set_option('display.max_colwidth', -1)
In [116]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('capita')!=-1)]
Out[116]:
Variable name Variable definition
12 hc Human capital index, based on years of schooli...
19 cn Capital stock at current PPPs (in mil. 2017US$)
20 ck Capital services levels at current PPPs (USA=1)
28 rnna Capital stock at constant 2017 national prices...
29 rkna Capital services at constant 2017 national pri...
34 delta Average depreciation rate of the capital stock
47 i_irr 0/1/2/3: the observation for irr is not an out...
53 csh_i Share of gross capital formation at current PPPs
61 pl_i Price level of capital formation, price level...
65 pl_n Price level of the capital stock, price level ...
66 pl_k Price level of the capital services, price lev...

So, it seems the data does not contain that variable. But do not panic...we know how to compute it based on GDP and Population. Let's do it!

Identify the name of the variable for GDP

In [117]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('GDP')!=-1)]
Out[117]:
Variable name Variable definition
7 rgdpe Expenditure-side real GDP at chained PPPs (in ...
8 rgdpo Output-side real GDP at chained PPPs (in mil. ...
17 cgdpe Expenditure-side real GDP at current PPPs (in ...
18 cgdpo Output-side real GDP at current PPPs (in mil. ...
25 rgdpna Real GDP at constant 2017 national prices (in ...
32 labsh Share of labour compensation in GDP at current...
38 pl_con Price level of CCON (PPP/XR), price level of U...
39 pl_da Price level of CDA (PPP/XR), price level of US...
40 pl_gdpo Price level of CGDPo (PPP/XR), price level of ...
46 i_outlier 0/1: the observation on pl_gdpe or pl_gdpo is ...
57 csh_r Share of residual trade and GDP statistical di...
60 pl_c Price level of household consumption, price l...
61 pl_i Price level of capital formation, price level...
62 pl_g Price level of government consumption, price ...
63 pl_x Price level of exports, price level of USA GDP...
64 pl_m Price level of imports, price level of USA GDP...

Identify the name of the variable for population

In [118]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('population')!=-1)]
Out[118]:
Variable name Variable definition
9 pop Population (in millions)

Create a new variables/columns with real GDPpc for all the measures included in PWT

In [119]:
# Get columns with GDP measures
gdpcols = pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('REAL GDP')!=-1), 'Variable name'].tolist()

# Generate GDPpc for each measure
for gdp in gdpcols:
    pwt[gdp + '_pc'] = pwt[gdp] / pwt['pop']

# GDPpc data
gdppccols = [col+'_pc' for col in gdpcols]
pwt[['countrycode', 'country', 'year'] + gdppccols]
Out[119]:
countrycode country year rgdpe_pc rgdpo_pc cgdpe_pc cgdpo_pc rgdpna_pc
0 ABW Aruba 1950 NaN NaN NaN NaN NaN
1 ABW Aruba 1951 NaN NaN NaN NaN NaN
2 ABW Aruba 1952 NaN NaN NaN NaN NaN
3 ABW Aruba 1953 NaN NaN NaN NaN NaN
4 ABW Aruba 1954 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
12805 ZWE Zimbabwe 2015 2905.732553 2880.905780 2892.674328 2856.095690 3040.848887
12806 ZWE Zimbabwe 2016 2984.619759 2919.616893 2970.770578 2912.558803 3016.730437
12807 ZWE Zimbabwe 2017 3137.841301 3112.875107 3137.841301 3112.875107 3112.875107
12808 ZWE Zimbabwe 2018 3069.860600 3007.236919 3071.061791 3017.391036 3217.517468
12809 ZWE Zimbabwe 2019 2887.996649 2787.658975 2889.980517 2805.080907 2915.172824

12810 rows × 8 columns

Now let's use the apply function to compute logs.

In [120]:
pwt[['l'+col for col in gdppccols]] = pwt[gdppccols].apply(np.log, axis=1)
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]]
Out[120]:
countrycode country year lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
0 ABW Aruba 1950 NaN NaN NaN NaN NaN
1 ABW Aruba 1951 NaN NaN NaN NaN NaN
2 ABW Aruba 1952 NaN NaN NaN NaN NaN
3 ABW Aruba 1953 NaN NaN NaN NaN NaN
4 ABW Aruba 1954 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
12805 ZWE Zimbabwe 2015 7.974441 7.965860 7.969937 7.957211 8.019892
12806 ZWE Zimbabwe 2016 8.001228 7.979208 7.996577 7.976787 8.011929
12807 ZWE Zimbabwe 2017 8.051290 8.043302 8.051290 8.043302 8.043302
12808 ZWE Zimbabwe 2018 8.029387 8.008777 8.029779 8.012148 8.076365
12809 ZWE Zimbabwe 2019 7.968318 7.932957 7.969005 7.939188 7.977684

12810 rows × 8 columns

How correlated are these measures of log GDP per capita?

In [121]:
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr()
Out[121]:
lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
year
1950 lrgdpe_pc 1.000000 0.995984 0.999441 0.995318 0.818344
lrgdpo_pc 0.995984 1.000000 0.996080 0.999158 0.823961
lcgdpe_pc 0.999441 0.996080 1.000000 0.996412 0.821171
lcgdpo_pc 0.995318 0.999158 0.996412 1.000000 0.828410
lrgdpna_pc 0.818344 0.823961 0.821171 0.828410 1.000000
... ... ... ... ... ... ...
2019 lrgdpe_pc 1.000000 0.996471 0.999999 0.996911 0.994584
lrgdpo_pc 0.996471 1.000000 0.996466 0.999960 0.997909
lcgdpe_pc 0.999999 0.996466 1.000000 0.996909 0.994581
lcgdpo_pc 0.996911 0.999960 0.996909 1.000000 0.997914
lrgdpna_pc 0.994584 0.997909 0.994581 0.997914 1.000000

350 rows × 5 columns

While it seems they are highly correlated, it is hard to see here directly. Let's get the statistics for each measures correlations across all years.

In [122]:
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr().describe()
Out[122]:
lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
count 350.000000 350.000000 350.000000 350.000000 350.000000
mean 0.980488 0.976436 0.980427 0.979111 0.935778
std 0.036614 0.037566 0.036591 0.036655 0.052340
min 0.818344 0.822508 0.821171 0.825543 0.818344
25% 0.982882 0.975386 0.982467 0.978188 0.907210
50% 0.996006 0.991856 0.996237 0.995459 0.930806
75% 0.999751 0.999158 0.999751 0.999158 0.987301
max 1.000000 1.000000 1.000000 1.000000 1.000000

Ok. This gives us a better sense of how strongly correlated these measures of log GDP per capita are. In what follows we will use only one, namely Log[GDPpc] based on Expenditure-side real GDP at chained PPPs (in mil. 2011US$), i.e., lrgdpe_pc.

Convergence post-1960?

Let's start by looking at the distribution of Log[GDPpc] in 1960. For these we need to subset our dataframe and select only the rows for the year 1960. This is don with the loc property of the dataframe.

In [123]:
gdppc1960 = pwt.loc[pwt.year==1960, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
gdppc1960
Out[123]:
countrycode country year lrgdpe_pc
10 ABW Aruba 1960 NaN
80 AGO Angola 1960 NaN
150 AIA Anguilla 1960 NaN
220 ALB Albania 1960 NaN
290 ARE United Arab Emirates 1960 NaN
... ... ... ... ...
12470 VNM Viet Nam 1960 NaN
12540 YEM Yemen 1960 NaN
12610 ZAF South Africa 1960 8.783560
12680 ZMB Zambia 1960 7.958144
12750 ZWE Zimbabwe 1960 7.818258

183 rows × 4 columns

gdppc1960 has the data for all countries in th eyear 1960. We can plot the histogram using the functions of the dataframe.

In [124]:
gdppc1960.lrgdpe_pc.hist()
Out[124]:
<AxesSubplot:>

We can also plot it using the seaborn package. Let's plot the kernel density of the distribution

In [125]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-density.pdf', dpi=300, bbox_inches='tight')
In [126]:
fig
Out[126]:

Let's now also include the distribution for other years

In [137]:
gdppc1980 = pwt.loc[pwt.year==1980, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, shade=True, label='1980', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
ax.legend()
plt.savefig(pathgraphs + 'y1960-1980-density.pdf', dpi=300, bbox_inches='tight')
In [138]:
fig
Out[138]:
In [139]:
gdppc2000 = pwt.loc[pwt.year==2000, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, shade=True, label='1980', linewidth=2)
sns.kdeplot(gdppc2000.lrgdpe_pc, ax=ax, shade=True, label='2000', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
ax.legend()
plt.savefig(pathgraphs + 'y1960-2000-density.pdf', dpi=300, bbox_inches='tight')
In [140]:
fig
Out[140]:

Let's show the evolution of the distribution by looking at it every 10 years starting from 1950 onwards. Moreover, let's do everything in a unique piece of code.

In [141]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1950, 2025, 10)) + [pwt.year.max()]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
fig, ax = plt.subplots()
k = 0
for t in period:
    sns.kdeplot(pwt.loc[pwt.year==t].lrgdpe_pc, ax=ax, shade=True, label=str(t), linewidth=2, color=mycolors[k])
    k += 1
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
ax.legend()
plt.savefig(pathgraphs + 'y1950-2010-density.pdf', dpi=300, bbox_inches='tight')
In [142]:
fig
Out[142]:

Persistence

The lack of convergence in the last 60 years suggest that there is some persistence in (recent) development. Let's explore this by plotting the association between past GDP per capita across different periods. In order to make things more comparable, let's normalize looking at income levels relative to the US. To do so, it's better to use the year as the index of the dataframe.

In [143]:
pwt.set_index('year', inplace=True)
pwt['lrgdpe_pc_US'] = pwt.loc[pwt.countrycode=='USA', 'lrgdpe_pc']
pwt['lrgdpe_pc_rel'] = pwt.lrgdpe_pc / pwt.lrgdpe_pc_US
pwt.reset_index(inplace=True)
pwt[['countrycode', 'country', 'year', 'lrgdpe_pc_rel']]
Out[143]:
countrycode country year lrgdpe_pc_rel
0 ABW Aruba 1950 NaN
1 ABW Aruba 1951 NaN
2 ABW Aruba 1952 NaN
3 ABW Aruba 1953 NaN
4 ABW Aruba 1954 NaN
... ... ... ... ...
12805 ZWE Zimbabwe 2015 0.726013
12806 ZWE Zimbabwe 2016 0.727573
12807 ZWE Zimbabwe 2017 0.730951
12808 ZWE Zimbabwe 2018 0.727346
12809 ZWE Zimbabwe 2019 0.720651

12810 rows × 4 columns

Let's plot the relative income levels in 1960 to 1980, 2000 and 2017. First let's create the wide version of this data.

In [144]:
relgdppc = pwt[['countrycode', 'year', 'lrgdpe_pc_rel']].pivot(index='countrycode', columns='year', values='lrgdpe_pc_rel')
relgdppc.columns = ['y' + str(col) for col in relgdppc.columns]
relgdppc.reset_index(inplace=True)
relgdppc
Out[144]:
countrycode y1950 y1951 y1952 y1953 y1954 y1955 y1956 y1957 y1958 ... y2010 y2011 y2012 y2013 y2014 y2015 y2016 y2017 y2018 y2019
0 ABW NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.971785 0.972698 0.966964 0.965812 0.961921 0.959951 0.958192 0.956832 0.953101 0.951019
1 AGO NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.799210 0.820745 0.825169 0.825301 0.827355 0.815370 0.809379 0.809938 0.811589 0.802877
2 AIA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.947898 0.946339 0.937914 0.932076 0.934279 0.934142 0.928033 0.913169 0.915423 0.917034
3 ALB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.845753 0.846474 0.847895 0.845258 0.847195 0.847901 0.847705 0.850693 0.852540 0.852854
4 ARE NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.019227 1.029266 1.026225 1.023208 1.024195 1.013461 1.010135 1.010522 1.005990 1.008647
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
178 VNM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.771328 0.777442 0.784826 0.786367 0.789719 0.792964 0.797312 0.802582 0.806857 0.810306
179 YEM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.772557 0.758383 0.749334 0.747058 0.744926 0.707611 0.686753 0.657612 0.674280 0.673591
180 ZAF 0.89849 0.893105 0.881282 0.88735 0.895034 0.891319 0.894591 0.897244 0.896683 ... 0.868685 0.871505 0.869000 0.867254 0.864386 0.862042 0.860408 0.859622 0.857125 0.855232
181 ZMB NaN NaN NaN NaN NaN 0.814531 0.817760 0.797963 0.786983 ... 0.748095 0.756346 0.748758 0.745580 0.744860 0.736409 0.737912 0.733792 0.732958 0.731187
182 ZWE NaN NaN NaN NaN 0.780592 0.776128 0.781405 0.787890 0.786626 ... 0.706880 0.709856 0.722509 0.725709 0.724249 0.726013 0.727573 0.730951 0.727346 0.720651

183 rows × 71 columns

In [147]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
k = 0
fig, ax = plt.subplots()
ax.plot([relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], [relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], c='r', label='45 degree')
sns.regplot(x='y1960', y='y2019', data=relgdppc, ax=ax, label='1960-2019')
movex = relgdppc.y1960.mean() * 0.006125
movey = relgdppc.y2019.mean() * 0.006125
for line in range(0,relgdppc.shape[0]):
    if (np.isnan(relgdppc.y1960[line])==False) & (np.isnan(relgdppc.y2019[line])==False):
        ax.text(relgdppc.y1960[line]+movex, relgdppc.y2019[line]+movey, relgdppc.countrycode[line], horizontalalignment='left', fontsize=12, color='black', weight='semibold')
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in 2019] relative to US')
ax.legend()
plt.savefig(pathgraphs + '1960_versus_2019_drop.pdf', dpi=300, bbox_inches='tight')
In [148]:
fig
Out[148]:

Let's create a function that will simplify our plotting of this figure for various years

In [149]:
def PersistencePlot(dfin, var0='y1960', var1='y2010', labelvar='countrycode', 
                    dx=0.006125, dy=0.006125, 
                    xlabel='Log[Income per capita 1960] relative to US', 
                    ylabel='Log[Income per capita in 2010] relative to US',
                    linelabel='1960-2010',
                    filename='1960_versus_2010_drop.pdf'):
    '''
    Plot the association between var0 and var in dataframe using labelvar for labels. 
    '''
    sns.set(rc={'figure.figsize':(11.7,8.27)})
    sns.set_context("talk")
    df = dfin.copy()
    df = df.dropna(subset=[var0, var1]).reset_index(drop=True)
    # Plot
    k = 0
    fig, ax = plt.subplots()
    ax.plot([df[var0].min()*.99, df[var0].max()*1.01], [df[var0].min()*.99, df[var0].max()*1.01], c='r', label='45 degree')
    sns.regplot(x=var0, y=var1, data=df, ax=ax, label=linelabel)
    movex = df[var0].mean() * dx
    movey = df[var1].mean() * dy
    for line in range(0,df.shape[0]):
        ax.text(df[var0][line]+movex, df[var1][line]+movey, df[labelvar][line], horizontalalignment='left', fontsize=12, color='black')
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.legend()
    plt.savefig(pathgraphs + filename, dpi=300, bbox_inches='tight')
    pass
In [150]:
PersistencePlot(relgdppc, var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US',
                    filename='1980_versus_2010_drop.pdf')
In [151]:
PersistencePlot(relgdppc.loc[(relgdppc.countrycode!='BRN')& (relgdppc.countrycode!='ARE')], var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US', linelabel='1980-2010',
                filename='1980_versus_2010_drop.pdf')
In [152]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1980, 2020, 20)) + [pwt.year.max()]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
k = 0
fig, ax = plt.subplots()
for t in period:
    sns.regplot(x='y1960', y='y'+str(t), data=relgdppc, ax=ax, label='1960-'+str(t))
    k += 1
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in other period] relative to US')
ax.legend()
Out[152]:
<matplotlib.legend.Legend at 0x1913702e0>
In [153]:
fig
Out[153]:

Getting data from the World Bank

The World Bank (WB) is a major source of free data. pandas has a subpackage that allows you download from many sources including the WB. The package we will use to access these API is pandas-datareader. pandas-datareader can be used to download data from a host of sources including the WB, OECD, FRED (see here).

In [154]:
from pandas_datareader import data, wb

We can now use wb to get information and data from the WB. Let's start by downloading teh set of basic information about the countries included in the API.

In [155]:
wbcountries = wb.get_countries()
wbcountries['name'] = wbcountries.name.str.strip()
wbcountries
Out[155]:
iso3c iso2c name region adminregion incomeLevel lendingType capitalCity longitude latitude
0 ABW AW Aruba Latin America & Caribbean High income Not classified Oranjestad -70.0167 12.5167
1 AFE ZH Africa Eastern and Southern Aggregates Aggregates Aggregates NaN NaN
2 AFG AF Afghanistan South Asia South Asia Low income IDA Kabul 69.1761 34.5228
3 AFR A9 Africa Aggregates Aggregates Aggregates NaN NaN
4 AFW ZI Africa Western and Central Aggregates Aggregates Aggregates NaN NaN
... ... ... ... ... ... ... ... ... ... ...
294 XZN A5 Sub-Saharan Africa excluding South Africa and ... Aggregates Aggregates Aggregates NaN NaN
295 YEM YE Yemen, Rep. Middle East & North Africa Middle East & North Africa (excluding high inc... Low income IDA Sana'a 44.2075 15.3520
296 ZAF ZA South Africa Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Upper middle income IBRD Pretoria 28.1871 -25.7460
297 ZMB ZM Zambia Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Low income IDA Lusaka 28.2937 -15.3982
298 ZWE ZW Zimbabwe Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income Blend Harare 31.0672 -17.8312

299 rows × 10 columns

Let's use wb to find all the series that have the word "population".

In [156]:
popvars = wb.search(string='population')
popvars
Out[156]:
id name unit source sourceNote sourceOrganization topics
24 1.1_ACCESS.ELECTRICITY.TOT Access to electricity (% of total population) Sustainable Energy for All Access to electricity is the percentage of pop... b'World Bank Global Electrification Database 2...
39 1.2_ACCESS.ELECTRICITY.RURAL Access to electricity (% of rural population) Sustainable Energy for All Access to electricity is the percentage of rur... b'World Bank Global Electrification Database 2...
40 1.3_ACCESS.ELECTRICITY.URBAN Access to electricity (% of urban population) Sustainable Energy for All Access to electricity is the percentage of tot... b'World Bank Global Electrification Database 2...
164 2.1_ACCESS.CFT.TOT Access to Clean Fuels and Technologies for coo... Sustainable Energy for All b''
195 3.11.01.01.popcen Population census Statistical Capacity Indicators Population censuses collect data on the size, ... b'World Bank Microdata library. Original sourc...
... ... ... ... ... ... ... ...
20220 per_sionl.overlap_pop_urb Population only receiving All Social Insurance... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor
20221 per_sionl.overlap_q1_preT_tot Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor
20222 per_sionl.overlap_q1_rur Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor
20223 per_sionl.overlap_q1_tot Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor
20224 per_sionl.overlap_q1_urb Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... Percentage of population only receiving All So... b'ASPIRE' Social Protection & Labor

2242 rows × 7 columns

Lot's of variables are available, from multiple sources that have been collected by the WB. If you check their website you can see more information on them, also identify and search the variables you may want to focus on. Here let's download the number of males and females in the population by age group, the total population, as well as the total urban population for the year 2017.

In [157]:
femalepop = popvars.loc[popvars.id.apply(lambda x: x.find('SP.POP.')!=-1 and x.endswith('FE'))]
malepop = popvars.loc[popvars.id.apply(lambda x: x.find('SP.POP.')!=-1 and x.endswith('MA'))]
popfields = ['SP.POP.0014.FE.IN', 'SP.POP.1564.FE.IN', 'SP.POP.65UP.FE.IN',
             'SP.POP.0014.MA.IN', 'SP.POP.1564.MA.IN', 'SP.POP.65UP.MA.IN',
             'SP.POP.TOTL.FE.IN', 'SP.POP.TOTL.MA.IN', 'SP.POP.TOTL',
             'EN.URB.MCTY', 'EN.URB.LCTY'] + malepop.id.tolist() + femalepop.id.tolist()
popfields
Out[157]:
['SP.POP.0014.FE.IN',
 'SP.POP.1564.FE.IN',
 'SP.POP.65UP.FE.IN',
 'SP.POP.0014.MA.IN',
 'SP.POP.1564.MA.IN',
 'SP.POP.65UP.MA.IN',
 'SP.POP.TOTL.FE.IN',
 'SP.POP.TOTL.MA.IN',
 'SP.POP.TOTL',
 'EN.URB.MCTY',
 'EN.URB.LCTY',
 'SP.POP.0004.MA',
 'SP.POP.0509.MA',
 'SP.POP.1014.MA',
 'SP.POP.1519.MA',
 'SP.POP.2024.MA',
 'SP.POP.2529.MA',
 'SP.POP.3034.MA',
 'SP.POP.3539.MA',
 'SP.POP.4044.MA',
 'SP.POP.4549.MA',
 'SP.POP.5054.MA',
 'SP.POP.5559.MA',
 'SP.POP.6064.MA',
 'SP.POP.6569.MA',
 'SP.POP.7074.MA',
 'SP.POP.7579.MA',
 'SP.POP.80UP.MA',
 'SP.POP.0004.FE',
 'SP.POP.0509.FE',
 'SP.POP.1014.FE',
 'SP.POP.1519.FE',
 'SP.POP.2024.FE',
 'SP.POP.2529.FE',
 'SP.POP.3034.FE',
 'SP.POP.3539.FE',
 'SP.POP.4044.FE',
 'SP.POP.4549.FE',
 'SP.POP.5054.FE',
 'SP.POP.5559.FE',
 'SP.POP.6064.FE',
 'SP.POP.6569.FE',
 'SP.POP.7074.FE',
 'SP.POP.7579.FE',
 'SP.POP.80UP.FE']

Let's also download GDP per capita in PPP at constant 2011 prices, which is the series NY.GDP.PCAP.PP.KD.

In [163]:
wdi = wb.download(indicator=popfields+['NY.GDP.PCAP.PP.KD'], country=wbcountries.iso2c.values, start=2020, end=2020)

wdi
/Users/ozak/anaconda3/envs/EconGrowthUG/lib/python3.9/site-packages/pandas_datareader/wb.py:592: UserWarning: Non-standard ISO country codes: 1A, 1W, 4E, 6F, 6N, 6X, 7E, 8S, A4, A5, A9, B1, B2, B3, B4, B6, B7, B8, C4, C5, C6, C7, C8, C9, D2, D3, D4, D5, D6, D7, D8, D9, EU, F1, F6, JG, M1, M2, N6, OE, R6, S1, S2, S3, S4, T2, T3, T4, T5, T6, T7, V1, V2, V3, V4, XC, XD, XE, XF, XG, XH, XI, XJ, XK, XL, XM, XN, XO, XP, XQ, XT, XU, XY, Z4, Z7, ZB, ZF, ZG, ZH, ZI, ZJ, ZQ, ZT
  warnings.warn(
Out[163]:
SP.POP.0014.FE.IN SP.POP.1564.FE.IN SP.POP.65UP.FE.IN SP.POP.0014.MA.IN SP.POP.1564.MA.IN SP.POP.65UP.MA.IN SP.POP.TOTL.FE.IN SP.POP.TOTL.MA.IN SP.POP.TOTL EN.URB.MCTY ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
country year
Aruba 2020 9113.0 38000.0 9011.0 9444.0 34616.0 6582.0 56124.0 50642.0 106766.0 NaN ... 3712.0 4054.0 4403.0 4897.0 4196.0 3363.0 2406.0 1549.0 1693.0 29563.756955
Africa Eastern and Southern 2020 138352035.0 190472197.0 12266504.0 140988093.0 185894168.0 9270302.0 341090736.0 336152563.0 677243299.0 NaN ... 15669437.0 12697054.0 10181631.0 8211128.0 6530132.0 4922064.0 3486260.0 2186612.0 1671566.0 3491.978104
Afghanistan 2020 7947146.0 10452059.0 552871.0 8333700.0 11164198.0 478367.0 18952076.0 19976265.0 38928341.0 4221532.0 ... 790703.0 627510.0 503171.0 396460.0 307557.0 229221.0 172197.0 91107.0 60346.0 1970.560169
Africa Western and Central 2020 97264292.0 123798711.0 6922424.0 100597279.0 124357945.0 5862825.0 227985427.0 230818049.0 458803476.0 NaN ... 10315219.0 8341291.0 6685492.0 5368142.0 4163670.0 3046986.0 2034894.0 1173845.0 666699.0 4021.293919
Angola 2020 7590076.0 8616194.0 399128.0 7658353.0 8281395.0 321122.0 16605398.0 16260870.0 32866268.0 8329798.0 ... 684932.0 561290.0 410848.0 344802.0 252683.0 158692.0 117389.0 69351.0 53696.0 6141.963839
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Kosovo 2020 NaN NaN NaN NaN NaN NaN NaN NaN 1790133.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 10706.513976
Yemen, Rep. 2020 5675675.0 8647645.0 477663.0 5906414.0 8722038.0 396533.0 14800983.0 15024985.0 29825968.0 2972988.0 ... 667208.0 491438.0 409632.0 350039.0 265461.0 196233.0 140734.0 80598.0 60098.0 NaN
South Africa 2020 8444602.0 19666314.0 1981762.0 8636966.0 19293233.0 1285813.0 30092678.0 29216012.0 59308690.0 21272284.0 ... 1917268.0 1655021.0 1401326.0 1183834.0 990310.0 784770.0 535895.0 367665.0 293432.0 12665.832404
Zambia 2020 4008980.0 5033425.0 238545.0 4083160.0 4866437.0 153409.0 9280950.0 9103006.0 18383956.0 2774133.0 ... 410788.0 307992.0 223199.0 172863.0 130471.0 95899.0 67563.0 42945.0 32138.0 3277.816510
Zimbabwe 2020 3102450.0 4380879.0 287588.0 3126770.0 3804836.0 160404.0 7770917.0 7092010.0 14862927.0 1529920.0 ... 397612.0 280771.0 210343.0 173037.0 140977.0 113324.0 72322.0 56587.0 45355.0 2134.555461

266 rows × 46 columns

Looks like there are lots of missing values...but be not fooled. This is a strange behavior of wb. Since the original source differs, it is not linking the countries correctly. Let's see this

In [164]:
wdi.sort_index()
Out[164]:
SP.POP.0014.FE.IN SP.POP.1564.FE.IN SP.POP.65UP.FE.IN SP.POP.0014.MA.IN SP.POP.1564.MA.IN SP.POP.65UP.MA.IN SP.POP.TOTL.FE.IN SP.POP.TOTL.MA.IN SP.POP.TOTL EN.URB.MCTY ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
country year
Afghanistan 2020 7947146.0 1.045206e+07 552871.0 8.333700e+06 1.116420e+07 478367.0 1.895208e+07 1.997626e+07 3.892834e+07 4221532.0 ... 790703.0 627510.0 503171.0 396460.0 307557.0 229221.0 172197.0 91107.0 60346.0 1970.560169
Africa Eastern and Southern 2020 138352035.0 1.904722e+08 12266504.0 1.409881e+08 1.858942e+08 9270302.0 3.410907e+08 3.361526e+08 6.772433e+08 NaN ... 15669437.0 12697054.0 10181631.0 8211128.0 6530132.0 4922064.0 3486260.0 2186612.0 1671566.0 3491.978104
Africa Western and Central 2020 97264292.0 1.237987e+08 6922424.0 1.005973e+08 1.243579e+08 5862825.0 2.279854e+08 2.308180e+08 4.588035e+08 NaN ... 10315219.0 8341291.0 6685492.0 5368142.0 4163670.0 3046986.0 2034894.0 1173845.0 666699.0 4021.293919
Albania 2020 231501.0 9.446380e+05 217330.0 2.576260e+05 9.867900e+05 199964.0 1.393469e+06 1.444380e+06 2.837849e+06 NaN ... 75550.0 88992.0 94353.0 101819.0 93972.0 72919.0 54917.0 44481.0 45013.0 13253.733337
Algeria 2020 6608900.0 1.358423e+07 1504101.0 6.889992e+06 1.381108e+07 1452738.0 2.169724e+07 2.215381e+07 4.385104e+07 2767661.0 ... 1502809.0 1220495.0 1035144.0 870269.0 684035.0 560160.0 364221.0 272083.0 307637.0 10735.382539
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
West Bank and Gaza 2020 900632.0 1.384634e+06 82225.0 9.419560e+05 1.421572e+06 72250.0 2.367491e+06 2.435778e+06 4.803269e+06 NaN ... 114483.0 93717.0 78325.0 61960.0 42418.0 31964.0 22840.0 15132.0 12289.0 5402.538773
World 2020 956900030.0 2.492744e+09 398610418.0 1.020597e+09 2.567276e+09 324985413.0 3.848254e+09 3.912859e+09 7.763933e+09 NaN ... 243289551.0 236807645.0 221792957.0 194492100.0 164041353.0 139814706.0 100946603.0 68683572.0 89165540.0 16269.969854
Yemen, Rep. 2020 5675675.0 8.647645e+06 477663.0 5.906414e+06 8.722038e+06 396533.0 1.480098e+07 1.502498e+07 2.982597e+07 2972988.0 ... 667208.0 491438.0 409632.0 350039.0 265461.0 196233.0 140734.0 80598.0 60098.0 NaN
Zambia 2020 4008980.0 5.033425e+06 238545.0 4.083160e+06 4.866437e+06 153409.0 9.280950e+06 9.103006e+06 1.838396e+07 2774133.0 ... 410788.0 307992.0 223199.0 172863.0 130471.0 95899.0 67563.0 42945.0 32138.0 3277.816510
Zimbabwe 2020 3102450.0 4.380879e+06 287588.0 3.126770e+06 3.804836e+06 160404.0 7.770917e+06 7.092010e+06 1.486293e+07 1529920.0 ... 397612.0 280771.0 210343.0 173037.0 140977.0 113324.0 72322.0 56587.0 45355.0 2134.555461

266 rows × 46 columns

Let's aggregate by year-country so that we have the correct data

In [165]:
wdi = wdi.groupby(['country', 'year']).max()
wdi.reset_index(inplace=True)
wdi
Out[165]:
country year SP.POP.0014.FE.IN SP.POP.1564.FE.IN SP.POP.65UP.FE.IN SP.POP.0014.MA.IN SP.POP.1564.MA.IN SP.POP.65UP.MA.IN SP.POP.TOTL.FE.IN SP.POP.TOTL.MA.IN ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
0 Afghanistan 2020 7947146.0 1.045206e+07 552871.0 8.333700e+06 1.116420e+07 478367.0 1.895208e+07 1.997626e+07 ... 790703.0 627510.0 503171.0 396460.0 307557.0 229221.0 172197.0 91107.0 60346.0 1970.560169
1 Africa Eastern and Southern 2020 138352035.0 1.904722e+08 12266504.0 1.409881e+08 1.858942e+08 9270302.0 3.410907e+08 3.361526e+08 ... 15669437.0 12697054.0 10181631.0 8211128.0 6530132.0 4922064.0 3486260.0 2186612.0 1671566.0 3491.978104
2 Africa Western and Central 2020 97264292.0 1.237987e+08 6922424.0 1.005973e+08 1.243579e+08 5862825.0 2.279854e+08 2.308180e+08 ... 10315219.0 8341291.0 6685492.0 5368142.0 4163670.0 3046986.0 2034894.0 1173845.0 666699.0 4021.293919
3 Albania 2020 231501.0 9.446380e+05 217330.0 2.576260e+05 9.867900e+05 199964.0 1.393469e+06 1.444380e+06 ... 75550.0 88992.0 94353.0 101819.0 93972.0 72919.0 54917.0 44481.0 45013.0 13253.733337
4 Algeria 2020 6608900.0 1.358423e+07 1504101.0 6.889992e+06 1.381108e+07 1452738.0 2.169724e+07 2.215381e+07 ... 1502809.0 1220495.0 1035144.0 870269.0 684035.0 560160.0 364221.0 272083.0 307637.0 10735.382539
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
261 West Bank and Gaza 2020 900632.0 1.384634e+06 82225.0 9.419560e+05 1.421572e+06 72250.0 2.367491e+06 2.435778e+06 ... 114483.0 93717.0 78325.0 61960.0 42418.0 31964.0 22840.0 15132.0 12289.0 5402.538773
262 World 2020 956900030.0 2.492744e+09 398610418.0 1.020597e+09 2.567276e+09 324985413.0 3.848254e+09 3.912859e+09 ... 243289551.0 236807645.0 221792957.0 194492100.0 164041353.0 139814706.0 100946603.0 68683572.0 89165540.0 16269.969854
263 Yemen, Rep. 2020 5675675.0 8.647645e+06 477663.0 5.906414e+06 8.722038e+06 396533.0 1.480098e+07 1.502498e+07 ... 667208.0 491438.0 409632.0 350039.0 265461.0 196233.0 140734.0 80598.0 60098.0 NaN
264 Zambia 2020 4008980.0 5.033425e+06 238545.0 4.083160e+06 4.866437e+06 153409.0 9.280950e+06 9.103006e+06 ... 410788.0 307992.0 223199.0 172863.0 130471.0 95899.0 67563.0 42945.0 32138.0 3277.816510
265 Zimbabwe 2020 3102450.0 4.380879e+06 287588.0 3.126770e+06 3.804836e+06 160404.0 7.770917e+06 7.092010e+06 ... 397612.0 280771.0 210343.0 173037.0 140977.0 113324.0 72322.0 56587.0 45355.0 2134.555461

266 rows × 48 columns

Let's merge this data with the original wbcountries dataframe, so that we can use it to plot.

In [166]:
wdi = wbcountries.merge(wdi, left_on='name', right_on='country')
wdi
Out[166]:
iso3c iso2c name region adminregion incomeLevel lendingType capitalCity longitude latitude ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
0 ABW AW Aruba Latin America & Caribbean High income Not classified Oranjestad -70.0167 12.51670 ... 3712.0 4054.0 4403.0 4897.0 4196.0 3363.0 2406.0 1549.0 1693.0 29563.756955
1 AFE ZH Africa Eastern and Southern Aggregates Aggregates Aggregates NaN NaN ... 15669437.0 12697054.0 10181631.0 8211128.0 6530132.0 4922064.0 3486260.0 2186612.0 1671566.0 3491.978104
2 AFG AF Afghanistan South Asia South Asia Low income IDA Kabul 69.1761 34.52280 ... 790703.0 627510.0 503171.0 396460.0 307557.0 229221.0 172197.0 91107.0 60346.0 1970.560169
3 AFW ZI Africa Western and Central Aggregates Aggregates Aggregates NaN NaN ... 10315219.0 8341291.0 6685492.0 5368142.0 4163670.0 3046986.0 2034894.0 1173845.0 666699.0 4021.293919
4 AGO AO Angola Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IBRD Luanda 13.2420 -8.81155 ... 684932.0 561290.0 410848.0 344802.0 252683.0 158692.0 117389.0 69351.0 53696.0 6141.963839
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
261 XKX XK Kosovo Europe & Central Asia Europe & Central Asia (excluding high income) Upper middle income IDA Pristina 20.9260 42.56500 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 10706.513976
262 YEM YE Yemen, Rep. Middle East & North Africa Middle East & North Africa (excluding high inc... Low income IDA Sana'a 44.2075 15.35200 ... 667208.0 491438.0 409632.0 350039.0 265461.0 196233.0 140734.0 80598.0 60098.0 NaN
263 ZAF ZA South Africa Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Upper middle income IBRD Pretoria 28.1871 -25.74600 ... 1917268.0 1655021.0 1401326.0 1183834.0 990310.0 784770.0 535895.0 367665.0 293432.0 12665.832404
264 ZMB ZM Zambia Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Low income IDA Lusaka 28.2937 -15.39820 ... 410788.0 307992.0 223199.0 172863.0 130471.0 95899.0 67563.0 42945.0 32138.0 3277.816510
265 ZWE ZW Zimbabwe Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income Blend Harare 31.0672 -17.83120 ... 397612.0 280771.0 210343.0 173037.0 140977.0 113324.0 72322.0 56587.0 45355.0 2134.555461

266 rows × 58 columns

Plot Male vs Female population in each country in 2020

In [167]:
PersistencePlot(wdi, var0='SP.POP.TOTL.FE.IN', var1='SP.POP.TOTL.MA.IN', xlabel='Number of Females',
                ylabel='Number of Males', labelvar='iso3c', linelabel='Female-Male', 
                dx=0.1, dy=0.1, filename='Female-Male-2017.pdf')

Let's take $log$s so we see this better

In [168]:
wdi['lpop_fe'] = np.log(wdi['SP.POP.TOTL.FE.IN'])
wdi['lpop_ma'] = np.log(wdi['SP.POP.TOTL.MA.IN'])
PersistencePlot(wdi, var0='lpop_fe', var1='lpop_ma', xlabel='Log[Number of Females]',
                ylabel='Log[Number of Males]', labelvar='iso3c', linelabel='Female-Male', 
                dx=0.01, dy=0.01, filename='Female-Male-2020.pdf')

Seems like the gender ratio, i.e., the number of males per female is quite different from 1. Let's plot the histogram of the gender ratio across countries to see this better.

In [169]:
(np.exp(wdi['lpop_ma'] - wdi['lpop_fe'])).hist()
Out[169]:
<AxesSubplot:>
In [170]:
wdi['gender_ratio'] = (wdi['SP.POP.TOTL.MA.IN'] / wdi['SP.POP.TOTL.FE.IN'])
wdi.gender_ratio.hist()
Out[170]:
<AxesSubplot:>
In [172]:
print('Maximum gender ratio = ', wdi.gender_ratio.max())
wdi.loc[wdi.gender_ratio>=1.05][['iso3c', 'name', 'region', 'gender_ratio']].sort_values('gender_ratio', ascending=False)
Maximum gender ratio =  3.024248350036666
Out[172]:
iso3c name region gender_ratio
200 QAT Qatar Middle East & North Africa 3.024248
8 ARE United Arab Emirates Middle East & North Africa 2.238453
182 OMN Oman Middle East & North Africa 1.940938
22 BHR Bahrain Middle East & North Africa 1.830740
152 MDV Maldives South Asia 1.735343
127 KWT Kuwait Middle East & North Africa 1.578660
205 SAU Saudi Arabia Middle East & North Africa 1.371103
88 GNQ Equatorial Guinea Sub-Saharan Africa 1.253411
183 OSS Other small states Aggregates 1.139644
32 BTN Bhutan South Asia 1.134325
56 DJI Djibouti Middle East & North Africa 1.106574
218 SST Small states Aggregates 1.103985
208 SGP Singapore East Asia & Pacific 1.098337
109 IND India South Asia 1.081758
31 BRN Brunei Darussalam East Asia & Pacific 1.078344
7 ARB Arab World Aggregates 1.076027
153 MEA Middle East & North Africa Aggregates 1.075067
260 WSM Samoa East Asia & Pacific 1.073098
204 SAS South Asia Aggregates 1.067193
240 TSA South Asia (IDA & IBRD) Aggregates 1.067193
184 PAK Pakistan South Asia 1.060172
169 MYS Malaysia East Asia & Pacific 1.056919
2 AFG Afghanistan South Asia 1.054041
40 CHN China East Asia & Pacific 1.053020
226 SYC Seychelles Sub-Saharan Africa 1.053003
In [173]:
print('Minimum gender ratio = ', wdi.gender_ratio.min())
wdi.loc[wdi.gender_ratio<=0.95][['iso3c', 'name', 'region', 'gender_ratio']].sort_values('gender_ratio')
Minimum gender ratio =  0.8454598203374091
Out[173]:
iso3c name region gender_ratio
178 NPL Nepal South Asia 0.845460
96 HKG Hong Kong SAR, China East Asia & Pacific 0.847768
51 CUW Curacao Latin America & Caribbean 0.851506
145 LVA Latvia Europe & Central Asia 0.854916
143 LTU Lithuania Europe & Central Asia 0.861644
248 UKR Ukraine Europe & Central Asia 0.863341
202 RUS Russian Federation Europe & Central Asia 0.863925
25 BLR Belarus Europe & Central Asia 0.871170
211 SLV El Salvador Latin America & Caribbean 0.880180
10 ARM Armenia Europe & Central Asia 0.888027
194 PRT Portugal Europe & Central Asia 0.897884
71 EST Estonia Europe & Central Asia 0.900260
192 PRI Puerto Rico Latin America & Caribbean 0.900483
0 ABW Aruba Latin America & Caribbean 0.902323
256 VIR Virgin Islands (U.S.) Latin America & Caribbean 0.904770
101 HUN Hungary Europe & Central Asia 0.908382
82 GEO Georgia Europe & Central Asia 0.910572
265 ZWE Zimbabwe Sub-Saharan Africa 0.912635
64 ECA Europe & Central Asia (excluding high income) Aggregates 0.918212
150 MDA Moldova Europe & Central Asia 0.919097
138 LKA Sri Lanka South Asia 0.921177
231 TEC Europe & Central Asia (IDA & IBRD countries) Aggregates 0.921263
146 MAC Macao SAR, China East Asia & Pacific 0.925448
160 MMR Myanmar East Asia & Pacific 0.930137
99 HRV Croatia Europe & Central Asia 0.931004
12 ATG Antigua and Barbuda Latin America & Caribbean 0.933082
250 URY Uruguay Latin America & Caribbean 0.934803
85 GIN Guinea Sub-Saharan Africa 0.937116
77 FRA France Europe & Central Asia 0.937801
30 BRB Barbados Latin America & Caribbean 0.937938
36 CEB Central Europe and the Baltics Aggregates 0.938224
33 BWA Botswana Sub-Saharan Africa 0.939443
190 POL Poland Europe & Central Asia 0.939984
171 NAM Namibia Sub-Saharan Africa 0.940767
65 ECS Europe & Central Asia Aggregates 0.941984
120 KAZ Kazakhstan Europe & Central Asia 0.943001
21 BGR Bulgaria Europe & Central Asia 0.943989
23 BHS Bahamas, The Latin America & Caribbean 0.945280
165 MOZ Mozambique Sub-Saharan Africa 0.945294
201 ROU Romania Europe & Central Asia 0.946391
233 THA Thailand East Asia & Pacific 0.947875
116 ITA Italy Europe & Central Asia 0.948866
221 SVK Slovak Republic Europe & Central Asia 0.949062

Gender ratio and development

In [174]:
wdi['lgdppc'] = np.log(wdi['NY.GDP.PCAP.PP.KD'])
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.scatterplot(x='lgdppc', y='gender_ratio', hue='region',
                hue_order=['East Asia & Pacific', 'Europe & Central Asia',
                           'Latin America & Caribbean ', 'Middle East & North Africa',
                           'North America', 'South Asia', 'Sub-Saharan Africa '],
                data=wdi.loc[wdi.region!='Aggregates'], alpha=1, style='incomeLevel', 
                style_order=['High income', 'Upper middle income', 'Lower middle income', 'Low income'],
                )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Log[GDP per capita]')
ax.set_ylabel('Gender Ratio')
plt.savefig(pathgraphs + 'Gender-Ratio-GDPpc.pdf', dpi=300, bbox_inches='tight')
In [175]:
fig
Out[175]:

Use statistical and mathematical functions to analyze the data

Now let's import the statsmodels module to run regressions.

In [176]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
from IPython.display import Latex

Let's estimate the elasticity of the number of men with respect to the number of women.

In [177]:
mod = sm.OLS(wdi['lpop_ma'],sm.add_constant(wdi['lpop_fe']), missing='drop').fit()
mod.summary2()
Out[177]:
Model: OLS Adj. R-squared: 0.998
Dependent Variable: lpop_ma AIC: -304.1073
Date: 2022-08-18 14:17 BIC: -297.1377
No. Observations: 241 Log-Likelihood: 154.05
Df Model: 1 F-statistic: 1.051e+05
Df Residuals: 239 Prob (F-statistic): 5.70e-318
R-squared: 0.998 Scale: 0.016441
Coef. Std.Err. t P>|t| [0.025 0.975]
const 0.0590 0.0499 1.1829 0.2380 -0.0393 0.1573
lpop_fe 0.9967 0.0031 324.1164 0.0000 0.9906 1.0027
Omnibus: 281.246 Durbin-Watson: 1.948
Prob(Omnibus): 0.000 Jarque-Bera (JB): 11562.963
Skew: 5.012 Prob(JB): 0.000
Kurtosis: 35.419 Condition No.: 98
In [178]:
print('The elasticity is %8.4f' % mod.params[1])
print(r'The $R^2$ is %8.3f' % mod.rsquared)
The elasticity is   0.9967
The $R^2$ is    0.998

Let's instead use the smf module, which allows us to run the regression wiritng the formula instead of having to pass the data and adding the constant as a new variable. Let's run a simple correlation between $\log(GDPpc)$ and the gender ratio.

In [179]:
mod = smf.ols(formula='lgdppc ~ gender_ratio', data=wdi[['lpop_ma','lpop_fe', 'lgdppc', 'gender_ratio']], missing='drop').fit()
mod.summary2()
Out[179]:
Model: OLS Adj. R-squared: 0.021
Dependent Variable: lgdppc AIC: 690.6298
Date: 2022-08-18 14:17 BIC: 697.4972
No. Observations: 229 Log-Likelihood: -343.31
Df Model: 1 F-statistic: 5.775
Df Residuals: 227 Prob (F-statistic): 0.0171
R-squared: 0.025 Scale: 1.1844
Coef. Std.Err. t P>|t| [0.025 0.975]
Intercept 8.4370 0.3809 22.1526 0.0000 7.6865 9.1875
gender_ratio 0.8829 0.3674 2.4032 0.0171 0.1590 1.6069
Omnibus: 15.634 Durbin-Watson: 1.815
Prob(Omnibus): 0.000 Jarque-Bera (JB): 7.515
Skew: -0.227 Prob(JB): 0.023
Kurtosis: 2.238 Condition No.: 11
In [180]:
mysummary=mod.summary2()
Latex(mysummary.as_latex())
Out[180]:
\begin{table} \caption{Results: Ordinary least squares} \label{} \begin{center} \begin{tabular}{llll} \hline Model: & OLS & Adj. R-squared: & 0.021 \\ Dependent Variable: & lgdppc & AIC: & 690.6298 \\ Date: & 2022-08-18 14:17 & BIC: & 697.4972 \\ No. Observations: & 229 & Log-Likelihood: & -343.31 \\ Df Model: & 1 & F-statistic: & 5.775 \\ Df Residuals: & 227 & Prob (F-statistic): & 0.0171 \\ R-squared: & 0.025 & Scale: & 1.1844 \\ \hline \end{tabular} \end{center} \begin{center} \begin{tabular}{lrrrrrr} \hline & Coef. & Std.Err. & t & P$> |$t$|$ & [0.025 & 0.975] \\ \hline Intercept & 8.4370 & 0.3809 & 22.1526 & 0.0000 & 7.6865 & 9.1875 \\ gender\_ratio & 0.8829 & 0.3674 & 2.4032 & 0.0171 & 0.1590 & 1.6069 \\ \hline \end{tabular} \end{center} \begin{center} \begin{tabular}{llll} \hline Omnibus: & 15.634 & Durbin-Watson: & 1.815 \\ Prob(Omnibus): & 0.000 & Jarque-Bera (JB): & 7.515 \\ Skew: & -0.227 & Prob(JB): & 0.023 \\ Kurtosis: & 2.238 & Condition No.: & 11 \\ \hline \end{tabular} \end{center} \end{table}
In [181]:
print('The semi-elasticity is %2.4f' % mod.params[1])
print(r'The $R^2$ is %1.3f' % mod.rsquared)
The semi-elasticity is 0.8829
The $R^2$ is 0.025

But of course we know correlation is not causation! Even more, from our figure we know that the positive association is driven by the rich oil producing countries of the Middle East & North Africa. To see this, let's replicate the analysis without those countries.

In [182]:
mod = smf.ols(formula='lgdppc ~ gender_ratio', data=wdi.loc[wdi.region!='Middle East & North Africa'][['lpop_ma','lpop_fe', 'lgdppc', 'gender_ratio']], missing='drop').fit()
mod.summary2()
Out[182]:
Model: OLS Adj. R-squared: 0.006
Dependent Variable: lgdppc AIC: 640.2134
Date: 2022-08-18 14:17 BIC: 646.9076
No. Observations: 210 Log-Likelihood: -318.11
Df Model: 1 F-statistic: 2.209
Df Residuals: 208 Prob (F-statistic): 0.139
R-squared: 0.011 Scale: 1.2230
Coef. Std.Err. t P>|t| [0.025 0.975]
Intercept 10.8201 1.0318 10.4869 0.0000 8.7860 12.8541
gender_ratio -1.5447 1.0394 -1.4861 0.1388 -3.5938 0.5044
Omnibus: 13.779 Durbin-Watson: 1.732
Prob(Omnibus): 0.001 Jarque-Bera (JB): 6.758
Skew: -0.224 Prob(JB): 0.034
Kurtosis: 2.244 Condition No.: 27
In [183]:
print('The semi-elasticity is %2.4f with a p-value of %1.4f' % (mod.params[1], mod.pvalues[1]))
print(r'The $R^2$ is %1.3f' % mod.rsquared)
print("Luckily we had plotted the data, right?!")
The semi-elasticity is -1.5447 with a p-value of 0.1388
The $R^2$ is 0.011
Luckily we had plotted the data, right?!

Homework

Using Pandas and Statsmodels write a Jupyter Notebook that:

  1. Uses the data from the Maddison Project to plot the evolution of total population across the world.
  2. Plots the evolution of the share of the world population by countries and WB regions.
  3. Downloads fertility, mortality and life expectancy data from the WB and plots its evolution in the last 60 years.
  4. Downloads mortality and life expectancy data (across regions and cohorts) from the Human Mortality Database and plots its evolution.
  5. Using this data analyze the convergence of life expectanty, mortality and fertility.

Submit your notebook as a pull request to the course's github repository.

Wages and Population In England 1200-1860

Let's get the population and wage series from Greg Clark's website for plotting.

In [184]:
uk1 = pd.read_excel('http://faculty.econ.ucdavis.edu/faculty/gclark/English%20Data/England%20NNI%20-%20Clark%20-%202015.xlsx', sheet_name='Decadal')
uk2 = pd.read_excel('http://faculty.econ.ucdavis.edu/faculty/gclark/English%20Data/Wages%202014.xlsx', sheet_name='Decadal')
In [185]:
uk1
Out[185]:
Decade Unnamed: 1 Pop England Share Males farm sector Male Farm Wage Male Non-Farm Wage Male average Wage Male Work Days per Year Total Wage Income Land rents ... All Capital Income Indirect Taxes Net National Income Unnamed: 15 Price Index - Domestic Expenditure Price Index - GDP Price Index - Cost of Living Unnamed: 19 Real Net National Income (DE) Real NNI/N
0 NaN NaN m. NaN d./day d./day d./day NaN (₤ m) (₤ m) ... (₤ m) (₤ m) (₤ m) NaN (1860s=100) (1860s=100) (1860s=100) NaN (1860s=100) (1860s=100)
1 1200.0 NaN 3.395946 0.555168 1.373647 2.282816 2.088783 300.0 3.078466 1.606036 ... 1.741253 0 6.425755 NaN 6.586338 7.126418 6.544197 NaN 14.897218 86.621351
2 1210.0 NaN 3.395946 0.575784 1.269451 1.84928 2.021137 300.0 3.200434 1.606036 ... 1.95638 0 6.76285 NaN 7.494729 8.109296 7.575843 NaN 14.042469 81.651332
3 1220.0 NaN 3.738005 0.626021 1.255379 2.135947 1.947335 300.0 3.394164 1.628947 ... 1.971441 0 6.994552 NaN 8.332736 9.016021 8.535567 NaN 13.143741 69.432007
4 1230.0 NaN 3.903905 0.652303 1.178929 NaN 1.848722 300.0 3.365295 1.331461 ... 2.04084 0 6.737596 NaN 8.265396 8.943159 8.40574 NaN 12.462355 63.034958
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
63 1820.0 NaN 11.982104 0.345313 20.333416 34.53486 34.327787 300.0 191.868124 38.191485 ... 78.77882 29.164632 338.003061 NaN 108.478968 112.086157 110.194354 NaN 48.128176 79.290718
64 1830.0 NaN 13.773176 0.308229 20.042939 35.383693 35.429759 300.0 227.645679 36.557278 ... 93.747986 25.876734 383.827677 NaN 100.892148 102.97158 101.268842 NaN 58.593182 84.035129
65 1840.0 NaN 15.636482 0.264763 21.096252 36.16764 37.016669 300.0 269.976598 39.165564 ... 101.875156 26.184313 437.201631 NaN 96.899076 97.81461 98.799054 NaN 69.558992 87.724658
66 1850.0 NaN 17.589614 0.246630 22.09969 37.840784 39.129929 300.0 321.386522 39.474329 ... 124.452112 28.390429 513.703392 NaN 93.317821 93.166374 95.128327 NaN 84.548996 94.905732
67 1860.0 NaN 19.722236 0.239390 23.625775 43.597919 44.659538 300.0 411.41326 43.176349 ... 168.819083 30.282961 653.691653 NaN 99.949265 99.955451 99.996226 NaN 100.343409 100.349161

68 rows × 22 columns

In [186]:
uk2
Out[186]:
Decade Farm Laborers, d/day Coal Miners, d./day Building Laborers, d/day Building Craftsmen, d/day Unnamed: 5 Cost of Living (1860s=100) Unnamed: 7 Real Farm Wage (1860s=100) Real Building Laborer Wage (1860s=100) Real Building Craftsman Wage (1860s=100)
0 1200 1.373647 NaN NaN 2.783922 NaN 6.544197 NaN 88.841573 NaN 80.673336
1 1210 1.262561 NaN NaN 2.078984 NaN 7.575843 NaN 72.045676 NaN 52.335306
2 1220 1.249455 NaN 1.625946 2.602945 NaN 8.535567 NaN 60.578574 51.791535 56.307104
3 1230 1.178929 NaN NaN NaN NaN 8.405740 NaN 59.258095 NaN NaN
4 1240 1.246828 NaN 1.878412 2.893921 NaN 8.871055 NaN 61.132054 58.464596 62.484216
... ... ... ... ... ... ... ... ... ... ... ...
62 1820 20.333416 32.226677 27.009300 42.060419 NaN 110.194354 NaN 78.081590 71.212912 72.500372
63 1830 20.042939 32.680000 28.021165 42.746221 NaN 101.268842 NaN 83.892814 80.390114 80.295861
64 1840 21.096252 30.920000 29.023687 43.311592 NaN 98.771980 NaN 90.604982 85.635493 83.439177
65 1850 22.099690 36.680000 30.103970 45.577598 NaN 95.128327 NaN 98.270928 92.231871 91.251668
66 1860 23.625775 41.760000 34.466257 52.729581 NaN 99.996226 NaN 100.013083 100.110361 100.049356

67 rows × 11 columns

Let's clean the data and merge it into a unique dataframe.

In [187]:
uk1 = uk1.loc[uk1.index.difference([0])].reset_index(drop=True)[[col for col in uk1.columns if col.find('Unnamed')==-1]]
uk2 = uk2[[col for col in uk2.columns if col.find('Unnamed')==-1]]
uk = uk1.merge(uk2)
uk.Decade = uk.Decade.astype(int)
uk['Pop England'] = uk['Pop England'].astype(float)
In [188]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='Decade', y='Pop England', data=uk.loc[uk.Decade<1730], alpha=1, label='Population', color='r')
ax2 = ax.twinx()
sns.lineplot(x='Decade', y='Real Farm Wage (1860s=100)', data=uk.loc[uk.Decade<1730], alpha=1, label='Real Wages', color='b')
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
handles, labels = ax.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
ax.legend(handles=(handles+handles2), labels=(labels+labels2), loc='upper left')
ax2.legend(handles=(handles+handles2), labels=(labels+labels2), loc='upper left')
nticks = 7
ax.yaxis.set_major_locator(matplotlib.ticker.LinearLocator(nticks))
ax2.yaxis.set_major_locator(matplotlib.ticker.LinearLocator(nticks))
ax.set_xlabel('Year')
ax.set_ylabel('Population (millions)')
plt.savefig(pathgraphs + 'UK-pop-GDPpc-1200-1730.pdf', dpi=300, bbox_inches='tight')
In [189]:
fig
Out[189]: